Discussion:
Reload bug & SRA oddness
Bernd Schmidt
2007-04-20 00:22:43 UTC
Permalink
execute/20040709-2.c is currently failing on the Blackfin at -O3. This
is a problem in reload: these days register allocation can assign the
same hard register to two pseudos even if their lifetimes overlap if one
of them is uninitialized. If you then have an insn where the
uninitialized pseudo dies, and reload needs an output reload, it can
choose the dying register for it. Doing so, it clobbers the other live
pseudo that shares the hardregister with the uninitialized register.

We've fixed this bug twice before, once in find_dummy_reload, and in
push_reload. This adds the same check to combine_reloads, where we can
hit the same problem.

Bootstrapped and regression tested on i686-linux; committed as 123986.

Now, the reason why we have an uninitialized register at all is some
strangeness in the tree optimizers. I'm not too familiar with the
tree-ssa machinery yet, but from what I've seen I blame the SRA pass.
I've attached a reduced version of the testcase. Consider the following
diff between 029t.forwprop1 and 030t.esra:

;; Function fn1D (fn1D) ;; Function fn1D (fn1D)

> Initial instantiation for y
> y.k -> y$k
> Using block-copy for y
> Initial instantiation for x
> Using block-copy for x
> Initial instantiation for D.1631
> Using block-copy for D.1631
>
>
>
> Symbols to be put in SSA form
>
> { y y$k }
>
>
> Incremental SSA update started
>
> Number of blocks in CFG: 3
> Number of blocks to update: 2
>
>
>
fn1D (x) fn1D (x)
{ {
> <unnamed-unsigned:29> y$k;
struct D D.1631; struct D D.1631;
struct D x; struct D x;
struct D y; struct D y;
unsigned int D.1534; unsigned int D.1534;
<unnamed-unsigned:29> D.1533; <unnamed-unsigned:29> D.1533;
unsigned int D.1532; unsigned int D.1532;
unsigned int D.1531; unsigned int D.1531;
<unnamed-unsigned:29> D.1530; <unnamed-unsigned:29> D.1530;

<bb 2>: <bb 2>:
> y.k = y$k_9(D);
y = sD; y = sD;
D.1530_1 = y.k; | y$k_10 = y.k;
> D.1530_1 = y$k_10;
D.1531_2 = (unsigned int) D.1530_1; D.1531_2 = (unsigned int)
D.1530_1;
D.1532_4 = D.1531_2 + x_3(D); D.1532_4 = D.1531_2 + x_3(D);
D.1533_5 = (<unnamed-unsigned:29>) D.1533_5 =
(<unnamed-unsigned:29>)
y.k = D.1533_5; | y$k_11 = D.1533_5;
> y.k = y$k_11;
x = y; x = y;
D.1631 = x; D.1631 = x;
> y.k = y$k_11;
y = D.1631; y = D.1631;
D.1530_6 = y.k; | y$k_12 = y.k;
> D.1530_6 = y$k_12;
D.1534_7 = (unsigned int) D.1530_6; D.1534_7 = (unsigned int)
D.1530_6;
return D.1534_7; return D.1534_7;

} }

What strikes me as odd here is the following piece of code:
y.k = y$k_9(D);
y = sD;
The (D) apparently stands for "default definition" which I take to mean
it's considered uninitialized. I don't know why SRA thinks it needs to
insert this statement, as all of y is overwritten immediately after.
This survives into RTL, there are several uninitialized pseudos with
variable name "y$k", and since we're dealing with bit fields it's not
eliminated - a lot of unneeded code seems to make it into the final output.

I'll try to dig deeper, but someone more familiar with this code
(Diego?) can probably fix it more quickly.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Andrew Pinski
2007-04-20 00:29:07 UTC
Permalink
On 4/19/07, Bernd Schmidt <***@t-online.de> wrote:
> execute/20040709-2.c is currently failing on the Blackfin at -O3. This
> is a problem in reload: these days register allocation can assign the
> same hard register to two pseudos even if their lifetimes overlap if one
> of them is uninitialized. If you then have an insn where the
> uninitialized pseudo dies, and reload needs an output reload, it can
> choose the dying register for it. Doing so, it clobbers the other live
> pseudo that shares the hardregister with the uninitialized register.

I reported this same bug back to the original developer of the SRA
patch which caused the regression on spu-elf in
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01036.html . I did not
look into why it failed after I saw SRA was messing up in the first
place.

Thanks,
Andrew Pinski
Daniel Berlin
2007-04-20 03:19:20 UTC
Permalink
On 4/19/07, Bernd Schmidt <***@t-online.de> wrote: }
>
> What strikes me as odd here is the following piece of code:
> y.k = y$k_9(D);
> y = sD;
> The (D) apparently stands for "default definition" which I take to mean
> it's considered uninitialized.

Yes
> I don't know why SRA thinks it needs to
> insert this statement, as all of y is overwritten immediately after.
SRA does not eliminate dead stores, the dead store elimination does.

SRA also does not try to discover which pieces of the structure are
actually used in the function prior to performing SRA, so it will
generate uninitialized uses when it does element by element copies.

This has been the way SRA has worked forever. I noticed it when I
wrote create_structure_field_vars, since after SRA, we end up with
"used" fields that have no structure field vars.
I mentioned this, and was told it was going to be too expensive to
track which pieces are used. After trying to do the same thing in
create_structure_vars, I agree.

It would also be a mistake to try to do significant optimization of
copy in/out placement directly in SRA. Again, that is why we have PRE
and DSE. If they are not moving/eliminating these things, they need
to be fixed.

> This survives into RTL, there are several uninitialized pseudos with
> variable name "y$k",

This is a deficiency in tree level DSE then, it should have been eliminated.
Bernd Schmidt
2007-04-20 03:53:21 UTC
Permalink
Daniel Berlin wrote:
> On 4/19/07, Bernd Schmidt <***@t-online.de> wrote: }
>>
>> What strikes me as odd here is the following piece of code:
>> y.k = y$k_9(D);
>> y = sD;
>> The (D) apparently stands for "default definition" which I take to mean
>> it's considered uninitialized.
>
> Yes
>> I don't know why SRA thinks it needs to
>> insert this statement, as all of y is overwritten immediately after.
> SRA does not eliminate dead stores, the dead store elimination does.
>
> SRA also does not try to discover which pieces of the structure are
> actually used in the function prior to performing SRA, so it will
> generate uninitialized uses when it does element by element copies.
>
> This has been the way SRA has worked forever. I noticed it when I
> wrote create_structure_field_vars, since after SRA, we end up with
> "used" fields that have no structure field vars.
> I mentioned this, and was told it was going to be too expensive to
> track which pieces are used. After trying to do the same thing in
> create_structure_vars, I agree.
>
> It would also be a mistake to try to do significant optimization of
> copy in/out placement directly in SRA. Again, that is why we have PRE
> and DSE. If they are not moving/eliminating these things, they need
> to be fixed.

I'm not convinced it can't be done with changes to SRA - this behaviour
does seem to have been introduced by a recent patch by aoliva. I'm
testing the appended patch which cures the inefficiency in the testcase
I sent previously and is pretty far along in a bootstrap. Alex, any
particular reason why you made this change in the first place? Andrew,
does this fix your problems too?


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Andrew Pinski
2007-04-20 04:29:56 UTC
Permalink
On 4/19/07, Bernd Schmidt <***@t-online.de> wrote:
>
> I'm not convinced it can't be done with changes to SRA - this behaviour
> does seem to have been introduced by a recent patch by aoliva. I'm
> testing the appended patch which cures the inefficiency in the testcase
> I sent previously and is pretty far along in a bootstrap. Alex, any
> particular reason why you made this change in the first place? Andrew,
> does this fix your problems too?
>

I will check tomorrow as my spu-elf build is still going. Also the
reload change did not fix up the regressions there.

Thanks,
Andrew Pinski
Alexandre Oliva
2007-04-20 06:56:08 UTC
Permalink
On Apr 20, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Alex, any particular reason why you made this change in
> the first place?

Sure. IIRC, without it, we'd initialize scalarized fields, fail to
copy them into the unscalarized variable, then block-copy the entire
variable with outdated contents. I remember this was one of the last
fixes I added to the patch, so taking it out you're certainly going to
observe regressions on x86_64-linux-gnu and/or i686-linux-gnu.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Bernd Schmidt
2007-04-20 12:13:56 UTC
Permalink
Alexandre Oliva wrote:
> On Apr 20, 2007, Bernd Schmidt <***@t-online.de> wrote:
>
>> Alex, any particular reason why you made this change in
>> the first place?
>
> Sure. IIRC, without it, we'd initialize scalarized fields, fail to
> copy them into the unscalarized variable, then block-copy the entire
> variable with outdated contents.

Doesn't this indicate we're passing the wrong value for the IS_OUTPUT
arg of scalarize_use somewhere?

> I remember this was one of the last
> fixes I added to the patch, so taking it out you're certainly going to
> observe regressions on x86_64-linux-gnu and/or i686-linux-gnu.

Bootstrap completed, and no check-gcc regressions on i686-linux. Can
you send me a testcase that fails when this change is made?


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Alexandre Oliva
2007-04-20 17:54:46 UTC
Permalink
On Apr 20, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Doesn't this indicate we're passing the wrong value for the IS_OUTPUT
> arg of scalarize_use somewhere?

Not quite. Modifying a bit-field needs to preserve part of the
original value, so it's both input and output.

> Bootstrap completed, and no check-gcc regressions on i686-linux. Can
> you send me a testcase that fails when this change is made?

I'll give your patch a try.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-04-20 18:33:23 UTC
Permalink
On Apr 20, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Bootstrap completed, and no check-gcc regressions on i686-linux. Can
> you send me a testcase that fails when this change is made?

Native configuration is x86_64-pc-linux-gnu
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -g

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Bernd Schmidt
2007-04-30 09:15:30 UTC
Permalink
Alexandre Oliva wrote:
> On Apr 20, 2007, Bernd Schmidt <***@t-online.de> wrote:
>
>> Bootstrap completed, and no check-gcc regressions on i686-linux. Can
>> you send me a testcase that fails when this change is made?
>
> Native configuration is x86_64-pc-linux-gnu
> FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -g

Okay, I've set up an x86-64 install and debugged this a little further.
I've attached a reduced testcase for this.

What's happening is that in esra, the following happens to function retmeA:
retmeA (x) retmeA (x)
{ {
> <unnamed-unsigned:11> x$k;
> unsigned char x$B0F5;
struct A D.2000; struct A D.2000;

<bb 2>: <bb 2>:
D.2000 = x; | x$k_1 = x.k;
> x$B0F5_2 = BIT_FIELD_REF <x,
> D.2000.k = x$k_1;
> BIT_FIELD_REF <D.2000, 5, 0>
return D.2000; return D.2000;

} }

It is then inlined into fn1A, where esra runs again, and when scanning
the function, we run into the new BIT_FIELD_REFs. When it tries to deal
with the following stmt:

BIT_FIELD_REF <D.4935, 5, 0> = BIT_FIELD_REF <x$B0F5_11, 5, 0>;

it just calls scalarize_use with D.4935 and is_output==true, causing us
to insert two additional stmts:

+ D.4935.k = SR.282_16;
BIT_FIELD_REF <D.4935, 5, 0> = BIT_FIELD_REF <x$B0F5_11, 5, 0>;
+ SR.282_17 = D.4935.k;

Both of these are unnecessary (and inefficient), as the BIT_FIELD_REF
does not touch "k". With my patch to restore is_output behaviour, we
only eliminate the first, but not the second.

IMO the real problem is that SRA doesn't know how to deal with these
BIT_FIELD_REFs properly - possibly something similar to how we deal with
ARRAY_RANGE_REF could be used? I'll keep investigating this, but for
the moment there are two more points I'd like to make:

* 20040709-2.c (which is very bitfield intensive) at -O3 has a pretty
large code size regression at least on x86-64 when compared to before
your SRA patch.
* I've seen no serious attempt to investigate or fix the regressions
Andrew Pinski has reported.

As a consequence of all these problems, I'd like to see this patch
reverted for now until it we know how to do it properly.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-04-30 17:21:38 UTC
Permalink
On Apr 30, 2007, Bernd Schmidt <***@t-online.de> wrote:

> As a consequence of all these problems, I'd like to see this patch
> reverted for now until it we know how to do it properly.

Sounds like a reasonable trade-off. I'm sorry that I haven't had time
to look into this throughout the past two weeks. I was finally able
to start looking into this yesterday, but I haven't got anywhere so
far.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-04-30 17:40:40 UTC
Permalink
On Apr 30, 2007, Alexandre Oliva <***@redhat.com> wrote:

> On Apr 30, 2007, Bernd Schmidt <***@t-online.de> wrote:
>> As a consequence of all these problems, I'd like to see this patch
>> reverted for now until it we know how to do it properly.

> Sounds like a reasonable trade-off. I'm sorry that I haven't had time
> to look into this throughout the past two weeks. I was finally able
> to start looking into this yesterday, but I haven't got anywhere so
> far.

Here's the reversal patch I'm checking in.
Alexandre Oliva
2007-05-01 15:38:10 UTC
Permalink
On Apr 30, 2007, Bernd Schmidt <***@t-online.de> wrote:

> As a consequence of all these problems, I'd like to see this patch
> reverted for now until it we know how to do it properly.

Here's a new patch that I hope will correct all the regressions,
including those on platforms I don't have access to. Bootstrapped and
regtested on x86_64-linux-gnu. Ok to install?
Eric Botcazou
2007-05-01 15:48:30 UTC
Permalink
> Here's a new patch that I hope will correct all the regressions,
> including those on platforms I don't have access to.

Including the Ada problems? They are quite annoying and break our testers.

--
Eric Botcazou
Alexandre Oliva
2007-05-04 00:19:38 UTC
Permalink
On May 1, 2007, Eric Botcazou <***@adacore.com> wrote:

>> Here's a new patch that I hope will correct all the regressions,
>> including those on platforms I don't have access to.

> Including the Ada problems?

What problems? You mentioned some unexpected transformations in
another e-mail, but they were by design. Is there anything else?

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Diego Novillo
2007-05-01 15:53:45 UTC
Permalink
Alexandre Oliva wrote on 05/01/07 11:38:

> PR middle-end/22156
> 2007-04-05 Alexandre Oliva <***@redhat.com>
> * tree-sra.c (struct sra_elt): Add in_bitfld_block.
> (sra_hash_tree): Handle BIT_FIELD_REFs.
> (sra_elt_hash): Don't hash bitfld blocks.
> (sra_elt_eq): Skip them in parent compares as well. Handle
> BIT_FIELD_REFs.
> (sra_walk_expr): Use a variable as both input and output if it's
> referenced in a BIT_FIELD_REF in a lhs.
> (build_element_name_1): Handle BIT_FIELD_REFs.
> (instantiate_element): Don't warn for any element whose parent
> is used as a whole.
> (instantiate_missing_elements_1): Return the sra_elt.
> (canon_type_for_field): New.
> (try_instantiate_multiple_fields): New. Initialize align.
> (instantiate_missing_elemnts): Use them.
> (generate_one_element_ref): Handle BIT_FIELD_REFs.
> (REPLDUP): New.
> (sra_build_elt_assignment): New. Avoid need for needless
> initialization of maxalign and minalign.
> (generate_copy_inout): Use them.
> (generate_element_copy): Likewise. Handle bitfld differences.
> (generate_element_zero): Don't recurse for blocks. Use
> sra_build_elt_assignment.
> (generate_one_element_int): Take elt instead of var. Use
> sra_build_elt_assignment.
> (generate_element_init_1): Adjust.
> (scalarize_use): Re-expand assignment to BIT_FIELD_REF of
> gimple_reg. Use REPLDUP.
> (scalarize_copy): Use REPLDUP.
> (scalarize_ldst): Move assert before dereference.
> (dump_sra_elt_name): Handle BIT_FIELD_REFs.

OK if testing included Ada.
Eric Botcazou
2007-05-01 16:07:37 UTC
Permalink
> OK if testing included Ada.

Ada bootstrap is broken because of Ian's patch. :-)

--
Eric Botcazou
Arnaud Charlet
2007-05-01 16:51:13 UTC
Permalink
> > OK if testing included Ada.
>
> Ada bootstrap is broken because of Ian's patch. :-)

So I'd suggest testing with Ada and reverting Ian's patch.

We've got two or three Ada breakage in a row and things are getting a bit
out of control, so it would indeed be nice to get Ada back in good
shape as it was a few weeks ago.

Arno
Arnaud Charlet
2007-05-01 16:54:45 UTC
Permalink
> So I'd suggest testing with Ada and reverting Ian's patch.

I meant: 'revert locally Ian's patch for testing with Ada'

I didn't mean (yet, hoping for a fix rapidly): reverting Ian's patch on the
repository and testing with Ada.

Arno
Andreas Schwab
2007-05-01 17:31:11 UTC
Permalink
Eric Botcazou <***@adacore.com> writes:

>> OK if testing included Ada.
>
> Ada bootstrap is broken because of Ian's patch. :-)

There is a patch in PR31739.

Andreas.

--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Roman Zippel
2007-05-01 16:24:24 UTC
Permalink
Hi,

On Tue, 1 May 2007, Alexandre Oliva wrote:

> On Apr 30, 2007, Bernd Schmidt <***@t-online.de> wrote:
>
> > As a consequence of all these problems, I'd like to see this patch
> > reverted for now until it we know how to do it properly.
>
> Here's a new patch that I hope will correct all the regressions,
> including those on platforms I don't have access to. Bootstrapped and
> regtested on x86_64-linux-gnu. Ok to install?

This still breaks this m68k test case:
http://gcc.gnu.org/ml/gcc/2007-04/msg00830.html

BTW the previous patch broke the m68k bootstrap and this doesn't make me
very confident this one will do better.

A specific comment about the patch, it might be better to keep combining
multiple fields, if they don't match the alignment exactly (e.g. as in the
above test case).
Even better would be if the per field copy of the sra pass would be
completely suppressed if the structure is already likely to be in a
register anyway.

bye, Roman
Alexandre Oliva
2007-05-04 04:07:29 UTC
Permalink
On May 1, 2007, Roman Zippel <***@linux-m68k.org> wrote:

> Hi,
> On Tue, 1 May 2007, Alexandre Oliva wrote:

>> On Apr 30, 2007, Bernd Schmidt <***@t-online.de> wrote:
>>
>> > As a consequence of all these problems, I'd like to see this patch
>> > reverted for now until it we know how to do it properly.
>>
>> Here's a new patch that I hope will correct all the regressions,
>> including those on platforms I don't have access to. Bootstrapped and
>> regtested on x86_64-linux-gnu. Ok to install?

> This still breaks this m68k test case:
> http://gcc.gnu.org/ml/gcc/2007-04/msg00830.html

> BTW the previous patch broke the m68k bootstrap and this doesn't make me
> very confident this one will do better.

Indeed, there was a serious bug in optimizing assignments to
BIT_FIELD_REFs that covered all fields scalarized into a single
variable, that affected only big-endian machines.

> A specific comment about the patch, it might be better to keep combining
> multiple fields, if they don't match the alignment exactly (e.g. as in the
> above test case).

In the patch below, alignment for access is inferred not from the
field, but from the enclosing object. This should get us much better
access patterns for small objects that fit in registers, and use the
declaration alignment for other wider objects (that are less likely to
be scalarized anyway).

After this change, we get the optimal code that you quoted, even
without -malign-int.


Would you please give it a spin on m68k? Bootstrap on
x86_64-linux-gnu hasn't completed yet, but it's already built Ada in
stage2, so I'm somewhat confident that this might work ;-)

If bootstrap and regression-testing completes, ok to install?
Alexandre Oliva
2007-05-04 05:24:27 UTC
Permalink
On May 4, 2007, Alexandre Oliva <***@redhat.com> wrote:

> Would you please give it a spin on m68k? Bootstrap on
> x86_64-linux-gnu hasn't completed yet, but it's already built Ada in
> stage2, so I'm somewhat confident that this might work ;-)

Err... Sorry, I failed to refresh the patch. This one should fare
better.
Roman Zippel
2007-05-05 18:18:51 UTC
Permalink
Hi,

On Fri, 4 May 2007, Alexandre Oliva wrote:

> > Would you please give it a spin on m68k? Bootstrap on
> > x86_64-linux-gnu hasn't completed yet, but it's already built Ada in
> > stage2, so I'm somewhat confident that this might work ;-)
>
> Err... Sorry, I failed to refresh the patch. This one should fare
> better.

This now doesn't even survive stage2. I've put an example of a miscompiled
function at www.xs4all.nl/~zippel/tree-cfg.i
It crashes at the memory access after 'sub.l %a0,%a0'
In the initial RTL output I also see some rather complex reg:DI
operations instead of simple 32bit operations.

bye, Roman
Alexandre Oliva
2007-05-06 05:13:21 UTC
Permalink
On May 5, 2007, Roman Zippel <***@linux-m68k.org> wrote:

> This now doesn't even survive stage2.

This one should fare better. It's completed bootstrap and regression
testing on x86_64-linux-gnu (--enable-languages=all,ada, as usual)

Ok to install?

:ADDPATCH sra:

> In the initial RTL output I also see some rather complex reg:DI
> operations instead of simple 32bit operations.

I don't see any simple way to avoid this without taking some
machine-specific "preferred access mode" into account.

It might make sense to test whether, instead of a DImode variable, a
pair of SImode variables would do (i.e., no bit-fields crossing the
SImode boundary), as long as registers are known to hold SImode but
not DImode. But is there a generic way to test for such "efficiency
boundaries"? And, if not, does it make sense to introduce yet another
machine hook to this end?
Bernd Schmidt
2007-05-06 12:12:53 UTC
Permalink
Alexandre Oliva wrote:
> It might make sense to test whether, instead of a DImode variable, a
> pair of SImode variables would do (i.e., no bit-fields crossing the
> SImode boundary), as long as registers are known to hold SImode but
> not DImode. But is there a generic way to test for such "efficiency
> boundaries"?

I'd start with

GET_MODE_SIZE (foo) <= UNITS_PER_WORD

as a first order approximation.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-05-06 14:27:08 UTC
Permalink
On May 6, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Alexandre Oliva wrote:
>> It might make sense to test whether, instead of a DImode variable, a
>> pair of SImode variables would do (i.e., no bit-fields crossing the
>> SImode boundary), as long as registers are known to hold SImode but
>> not DImode. But is there a generic way to test for such "efficiency
>> boundaries"?

> I'd start with

> GET_MODE_SIZE (foo) <= UNITS_PER_WORD

> as a first order approximation.

This is sort of equivalent. Ok to install on top of the other patch?
Alexandre Oliva
2007-05-06 15:00:47 UTC
Permalink
On May 6, 2007, Alexandre Oliva <***@redhat.com> wrote:

> On May 6, 2007, Bernd Schmidt <***@t-online.de> wrote:
>> Alexandre Oliva wrote:
>>> It might make sense to test whether, instead of a DImode variable, a
>>> pair of SImode variables would do (i.e., no bit-fields crossing the
>>> SImode boundary), as long as registers are known to hold SImode but
>>> not DImode. But is there a generic way to test for such "efficiency
>>> boundaries"?

> This is sort of equivalent. Ok to install on top of the other patch?

> * tree-sra.c (try_instantiate_multiple_fields): Clip alignment
> at word size.

And this is something I'm testing now, that will try to implement the
idea of extending the width to fit fields that cross the alignment
boundary, but not other fields that don't. It makes a lot of
difference for a testcase such as this on x86-32:

struct a {
unsigned long long x : 16, y : 15, z : 33;
};
struct a fa(struct a v) {
return v;
}

struct b {
unsigned long long x : 16, y : 16, z : 32;
};
struct b fb(struct b v) {
return v;
}

Without this patch, the code for fa sucks; with it, it's almost
identical to that of fb, except for ordering of operations.

Ok to install on top of the other two, if it survives bootstrapping
and regtesting on x86-32, where it might have more visible effects
than on x86-64, that doesn't regularly use wider-than-word types?
Alexandre Oliva
2007-05-06 23:44:25 UTC
Permalink
On May 6, 2007, Alexandre Oliva <***@redhat.com> wrote:

> struct a { unsigned long long x : 16, y : 15, z : 33; };
> struct a fa(struct a v) { return v; }
> struct b { unsigned long long x : 16, y : 16, z : 32; };
> struct b fb(struct b v) { return v; }

Err... This one (quilt-refreshed) should even compile ;-)

It has actually passed bootstrap and regression testing on
a i686-linux-gnu native built on a x86_64-linux-gnu OS. Ok to install
along with the other previous patches?
Diego Novillo
2007-05-06 23:51:33 UTC
Permalink
Alexandre Oliva wrote on 05/06/07 19:44:

> It has actually passed bootstrap and regression testing on
> a i686-linux-gnu native built on a x86_64-linux-gnu OS. Ok to install
> along with the other previous patches?

Please send a complete patch with a final description. At this point, I
am totally lost and have no idea what to review.
Alexandre Oliva
2007-05-07 00:14:01 UTC
Permalink
On May 6, 2007, Diego Novillo <***@acm.org> wrote:

> Alexandre Oliva wrote on 05/06/07 19:44:
>> It has actually passed bootstrap and regression testing on
>> a i686-linux-gnu native built on a x86_64-linux-gnu OS. Ok to install
>> along with the other previous patches?

> Please send a complete patch with a final description. At this point, I
> am totally lost and have no idea what to review.

Ok, here's a consolidated version.

:ADDPATCH sra:

This patch reintroduces some of the patch that I reverted on
2007-04-30, meant to scalarize into a single variable multiple small
fields that are not accessed directly in the source code, with the
following major changes since the original patch, installed on
2007-04-05:

- only assignments to BIT_FIELD_REFs are regarded as both input and
output now; the use_all machinery is restored

- variables that are created and potentially accessed as bit-fields
are zero-initialized, as long as they're not otherwise initialized
from parameters, to avoid undefined behavior in bit-field
modifications; restore no-warning marker logic.

- rework the code that decides which fields to put in a block such
that it tries to avoid creating variables that are wider than a
word, using as guide the alignment and the mode of the base variable
or of the type of the innermost field thereof that contains the
fields being analyzed, instead of relying on
DECL_OFFSET_ALIGN-determined words.

- fix bug in optimizing away BIT_FIELD_REFs in assignments to the
entire useful range of scalarization variables on big-endian targets
that caused the bits to be stored in the wrong locations.

- rework the code to avoid the need for artificial initialization
becuase of optimizer stupidity


This combination of patches was bootstrapped and regtested on
i686-linux-gnu. Ok to install?
Andrew Pinski
2007-05-07 02:21:27 UTC
Permalink
On 5/6/07, Alexandre Oliva <***@redhat.com> wrote:
> Ok, here's a consolidated version.

And the spu-elf regressions are still there.

This makes no sense
D.2296_13 = (<unnamed-signed:52>) D.2295_12;
SR.18_36 = D.2296_13;
SR.33_38 = BIT_FIELD_REF <SR.18_36, 52, 0>;

SR.33 is long long unsigned int and SR.18 is unnamed-signed:52.

Just have a cast instead of BIT_FIELD_REF.

Thanks,
Andrew Pinski
Bernd Schmidt
2007-05-09 12:24:55 UTC
Permalink
Alexandre Oliva wrote:

> This patch reintroduces some of the patch that I reverted on
> 2007-04-30, meant to scalarize into a single variable multiple small
> fields that are not accessed directly in the source code, with the
> following major changes since the original patch, installed on
> 2007-04-05:

I've tested this a bit, and on i386 it seems to have a beneficial effect
on the set of sources I tried. On the Blackfin it does nothing at all.

> - only assignments to BIT_FIELD_REFs are regarded as both input and
> output now; the use_all machinery is restored

Sounds like there's room for improvement. Do you think it would be hard
to solve this accurately?

Is the cost logic in decide_block_copy still correct after this patch?
It seems like it could be unnecessarily pessimistic.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Roman Zippel
2007-05-09 18:31:41 UTC
Permalink
Hi,

On Sun, 6 May 2007, Alexandre Oliva wrote:

> This patch reintroduces some of the patch that I reverted on
> 2007-04-30, meant to scalarize into a single variable multiple small
> fields that are not accessed directly in the source code, with the
> following major changes since the original patch, installed on
> 2007-04-05:

I've tested a bit earlier version and it bootstrapped again. :)
The only regression left now is that gcc.c-torture/execute/bf64-1.c fails
with -O3. main should be empty but isn't and the very first test fails.
The initial RTL shows this as the first instructions:

(insn 5 4 6 ../gcc/gcc/testsuite/gcc.c-torture/execute/bf64-1.c:28 (set (reg:HI 32)
(const_int 291 [0x123])) -1 (nil)
(nil))

(insn 6 5 7 ../gcc/gcc/testsuite/gcc.c-torture/execute/bf64-1.c:28 (set (subreg:SI (reg:DI 33) 4)
(zero_extract:SI (subreg:SI (reg:HI 32) 0)
(const_int 12 [0xc])
(const_int 16 [0x10]))) -1 (nil)
(nil))

The offset is wrong here and so the test later fails then it extracts the
value again.

One other thing I noticed is that now patterns like "(zero_extract:SI
(reg:SI) 32 0)" are generated which could simply be replaced with
(reg:SI). Apparently back ends never had to deal with this so far, so gcc
generates now e.g.:

bfins %d1,%d0{#0:#32}

It's easy enough to change in the back end, but maybe it's better to fix
it somewhere else?

bye, Roman
Alexandre Oliva
2007-05-10 07:46:22 UTC
Permalink
On May 9, 2007, Bernd Schmidt <***@t-online.de> wrote:

>> - only assignments to BIT_FIELD_REFs are regarded as both input and
>> output now; the use_all machinery is restored

> Sounds like there's room for improvement. Do you think it would be hard
> to solve this accurately?

I'm not sure what kind of improvement you have in mind. Are you
talking about not doing the in/out at all and breaking up the
BIT_FIELD_REF such that it affects the scalarized variables? Or about
doing the in/out only on the fields covered by the BIT_FIELD_REF? Or
something else?

> Is the cost logic in decide_block_copy still correct after this patch?
> It seems like it could be unnecessarily pessimistic.

That's possible. I haven't paid too much attention to that yet. I'm
still busy trying to get it to generate correct code for all targets
;-/

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Bernd Schmidt
2007-05-10 10:59:28 UTC
Permalink
Alexandre Oliva wrote:
> On May 9, 2007, Bernd Schmidt <***@t-online.de> wrote:
>
>>> - only assignments to BIT_FIELD_REFs are regarded as both input and
>>> output now; the use_all machinery is restored
>
>> Sounds like there's room for improvement. Do you think it would be hard
>> to solve this accurately?
>
> I'm not sure what kind of improvement you have in mind. Are you
> talking about not doing the in/out at all and breaking up the
> BIT_FIELD_REF such that it affects the scalarized variables? Or about
> doing the in/out only on the fields covered by the BIT_FIELD_REF? Or
> something else?

Either would be an improvement, but I was thinking about the second
option, which would have improved the code in a testcase I sent earlier
in the thread.


Bernd

--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-05-22 07:09:11 UTC
Permalink
On May 10, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Alexandre Oliva wrote:
>> On May 9, 2007, Bernd Schmidt <***@t-online.de> wrote:
>>
>>>> - only assignments to BIT_FIELD_REFs are regarded as both input and
>>>> output now; the use_all machinery is restored
>>
>>> Sounds like there's room for improvement. Do you think it would be hard
>>> to solve this accurately?
>>
>> I'm not sure what kind of improvement you have in mind. Are you
>> talking about not doing the in/out at all and breaking up the
>> BIT_FIELD_REF such that it affects the scalarized variables? Or about
>> doing the in/out only on the fields covered by the BIT_FIELD_REF? Or
>> something else?

> Either would be an improvement, but I was thinking about the second
> option, which would have improved the code in a testcase I sent earlier
> in the thread.

As it turns out, this case no longer triggers the in/out bits. As in,
the code in my last patch still requests an in/out, but GCC seems to
not scalarizes variables with use_all accesses, for some reason I
haven't quite understood, so in the end the in/out code has become
completely useless.

All other cases of use_all relied on this property, and there was
something about the use of BIT_FIELD_REFs in earlier versions of this
patch that broke this property. After all the simplification I made,
the property is restored, and I could simply remove the in/out code,
obviating your optimization request.

I'm testing a revised patch now, and I'll post it as soon as I have
and verify regression testing results, which could be anywhere from 8
to 24 hours ahead.


Now, does anyone with more understanding of SRA actually understand
how this property emerges? I've been looking at this code for weeks
but this behavior still seems like magic to me. TIA,

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Roman Zippel
2007-05-22 17:33:41 UTC
Permalink
Hi,

On Tue, 22 May 2007, Alexandre Oliva wrote:

> I'm testing a revised patch now, and I'll post it as soon as I have
> and verify regression testing results, which could be anywhere from 8
> to 24 hours ahead.

BTW the patch works fine on m68k.
The only small problem is that I still see full register as bitfield
accesses. As it's not exactly a regression (the output is already better
than before), it would be IMO ok to fix this in a seperate patch, but it
would be nice to get this fixed. :)
Thanks.

bye, Roman
Alexandre Oliva
2007-05-22 20:49:24 UTC
Permalink
On May 22, 2007, Roman Zippel <***@linux-m68k.org> wrote:

> Hi,
> On Tue, 22 May 2007, Alexandre Oliva wrote:

>> I'm testing a revised patch now, and I'll post it as soon as I have
>> and verify regression testing results, which could be anywhere from 8
>> to 24 hours ahead.

> BTW the patch works fine on m68k.

Excellent, thanks!

> The only small problem is that I still see full register as bitfield
> accesses.

Ugh. I thought I'd cought all of these. You got a self-contained
testcase?

Thanks,

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Roman Zippel
2007-05-23 13:07:48 UTC
Permalink
Hi,

On Tue, 22 May 2007, Alexandre Oliva wrote:

> > The only small problem is that I still see full register as bitfield
> > accesses.
>
> Ugh. I thought I'd cought all of these. You got a self-contained
> testcase?

My initial example:

struct K { unsigned int k : 6, l : 1, j : 10, i : 15; };
struct K retmeK (struct K x)
{
return x;
}

bye, Roman
Alexandre Oliva
2007-05-22 20:56:04 UTC
Permalink
On May 22, 2007, Alexandre Oliva <***@redhat.com> wrote:

> As it turns out, this case no longer triggers the in/out bits. As in,
> the code in my last patch still requests an in/out, but GCC seems to
> not scalarizes variables with use_all accesses, for some reason I
> haven't quite understood, so in the end the in/out code has become
> completely useless.

I lied. As it turned out, removing it broke a couple of testcases.
So the previous patch is the good one. Andrew and Roman both confirm
it.

I'll look into what difference the change I made last night made and
start from that in a separate further-improvement patch, just like
Roman's request.

I don't think it makes sense to hold up this big improvement patch any
further because some additional improments can (and will) still be
made, do you?


I guess I'll also introduce code I wrote to try to catch cases in
which the magic property failed to hold.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-05-10 07:49:06 UTC
Permalink
On May 9, 2007, Roman Zippel <***@linux-m68k.org> wrote:

> (insn 5 4 6 ../gcc/gcc/testsuite/gcc.c-torture/execute/bf64-1.c:28 (set (reg:HI 32)
> (const_int 291 [0x123])) -1 (nil)
> (nil))

> (insn 6 5 7 ../gcc/gcc/testsuite/gcc.c-torture/execute/bf64-1.c:28 (set (subreg:SI (reg:DI 33) 4)
> (zero_extract:SI (subreg:SI (reg:HI 32) 0)
> (const_int 12 [0xc])
> (const_int 16 [0x10]))) -1 (nil)
> (nil))

> The offset is wrong here and so the test later fails then it extracts the
> value again.

Thanks, I'll look into that.

> One other thing I noticed is that now patterns like "(zero_extract:SI
> (reg:SI) 32 0)" are generated which could simply be replaced with
> (reg:SI).

Ugh. I didn't bother optimizing these away because I was sure
something else would take care of them. Now that I don't it won't,
I'll avoid emitting it in the first place, thanks.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-05-15 17:37:56 UTC
Permalink
On May 10, 2007, Alexandre Oliva <***@redhat.com> wrote:

> On May 9, 2007, Roman Zippel <***@linux-m68k.org> wrote:

>> The offset is wrong here and so the test later fails then it extracts the
>> value again.

> Thanks, I'll look into that.

IIRC this was caused by the lack of explicit type conversions.

>> One other thing I noticed is that now patterns like "(zero_extract:SI
>> (reg:SI) 32 0)" are generated which could simply be replaced with
>> (reg:SI).

> Ugh. I didn't bother optimizing these away because I was sure
> something else would take care of them. Now that I don't it won't,
> I'll avoid emitting it in the first place, thanks.

I've fixed this in this new version of the patch. I've also arranged
for us to almost never issue BIT_FIELD_REFs where simpler bitwise
operations would do. GCC can do a much better optimization job then.

I still haven't addressed Bernd's suggestion (if you don't mind, I'd
rather do that in a separate patch, as an additional improvement; this
one is quickly growing out of hand), but I intend to do so next.

Roman, Andrew, would you please confirm that this patch fixes the
regressions on m68k and spu that you reported?

Thanks,


This was bootstrapped and regression-tested on x86_64-linux-gnu. Ok
to install?

:ADDPATCH sra:
Andrew Pinski
2007-05-17 21:17:15 UTC
Permalink
On 5/15/07, Alexandre Oliva <***@redhat.com> wrote:
> Roman, Andrew, would you please confirm that this patch fixes the
> regressions on m68k and spu that you reported?

I can confirm this patch fixes the spu-elf regression.

Thanks,
Andrew Pinski
Bernd Schmidt
2007-05-28 10:41:40 UTC
Permalink
Alexandre Oliva wrote:
> This was bootstrapped and regression-tested on x86_64-linux-gnu. Ok
> to install?

This appears to have addressed all the regressions, so it can go back
in. Thanks!

> :ADDPATCH sra:

:REVIEWMAIL:


Bernd

--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-05-31 19:57:03 UTC
Permalink
On May 28, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Alexandre Oliva wrote:
>> This was bootstrapped and regression-tested on x86_64-linux-gnu. Ok
>> to install?

> This appears to have addressed all the regressions, so it can go back
> in. Thanks!

Thanks. I hadn't noticed this approval before today.

It took me a while, but I've managed to address Roman's request to
remove full-width BIT_FIELD_REFs (it required some fiddling with EH
annotations), to verify the absence of regressions in Andrew Pinski's
testcase, and to handle your request for improving the copying of
fields in and out of structs accessed with BIT_FIELD_REFs.

I've not only arranged for GCC to sync only the fields covered by the
BIT_FIELD_REF, but also for it to try to explode the BIT_FIELD_REF
access to accesses to the scalarized fields, should the containing
struct be fully scalarized. This will then enable the original struct
to be thrown away even when it's accessed by BIT_FIELD_REFs.

While working on these, I realized I'd mistakenly used
WORDS_BIG_ENDIAN rather than BITS_BIG_ENDIAN for shift counts, that
I'd used 1 rather than true in some boolean arguments that are
generally passed as true, and a few other cosmetical changes that
would render a patch on top of the one you reviewed and approved quite
large and hard to follow.

So, instead of applying the previous patch and posting the
differences, I figured it would be cleaner to post the whole thing
again. I hope you don't mind. If you prefer, it wouldn't be too hard
to split the cosmetical changes on top of the previous patch from the
behavioral changes made along the way, but I'd rather not take the
risk of messing things up by doing this.

The patch below was bootstrapped and regtested on x86_64-linux-gnu.
Ok to install?

:ADDPATCH sra:
Bernd Schmidt
2007-06-02 17:40:40 UTC
Permalink
Alexandre Oliva wrote:

> It took me a while, but I've managed to address Roman's request to
> remove full-width BIT_FIELD_REFs (it required some fiddling with EH
> annotations),

Details? Comments in the code would be best.

> to verify the absence of regressions in Andrew Pinski's
> testcase, and to handle your request for improving the copying of
> fields in and out of structs accessed with BIT_FIELD_REFs.
>
> I've not only arranged for GCC to sync only the fields covered by the
> BIT_FIELD_REF, but also for it to try to explode the BIT_FIELD_REF
> access to accesses to the scalarized fields, should the containing
> struct be fully scalarized. This will then enable the original struct
> to be thrown away even when it's accessed by BIT_FIELD_REFs.

Nice.

> So, instead of applying the previous patch and posting the
> differences, I figured it would be cleaner to post the whole thing
> again. I hope you don't mind. If you prefer, it wouldn't be too hard
> to split the cosmetical changes on top of the previous patch from the
> behavioral changes made along the way, but I'd rather not take the
> risk of messing things up by doing this.

It's up to you how you want to proceed. If you want, you can install
the previous patch with bugfixes - I haven't had enough time yet to
fully understand the new code. A few preliminary comments:

> +/* Return true if a BIT_FIELD_REF<(FLD->parent), BLEN, BPOS>
> + expression (refereced as BF below) accesses any of the bits in FLD,
> + false if it doesn't. If DATA is non-null, the bit length of FLD
> + will be stored in DATA[0] and the initial bit position of FLD will
> + be stored in DATA[1], the number of bits of FLD that overlap with
> + BF will be stored in DATA[2], and the initial bit of FLD that
> + overlaps with BF will be stored in DATA[3]. */
> +
> +static bool
> +bitfield_overlaps_p (tree blen, tree bpos, struct sra_elt *fld, tree *data)

I'd rather have four separate, named variables. This is too opaque for
the reader.

> +static void
> +sra_explode_bitfield_assignment (tree var, tree vpos, bool to_var,
> + tree *listp, tree blen, tree bpos,
> + struct sra_elt *elt)

> +static void
> +sra_sync_for_bitfield_assignment (tree *listbeforep, tree *listafterp,
> + tree blen, tree bpos,
> + struct sra_elt *elt)

No comments for these functions.

> /* Preserve EH semantics. */
> if (stmt_ends_bb_p (stmt))
> {
> tree_stmt_iterator tsi;
> - tree first;
> + tree first, blist = NULL;
> + bool thr = (bsi->bb->flags & EDGE_COMPLEX) != 0;
>
> /* Extract the first statement from LIST. */
> tsi = tsi_start (list);
> - first = tsi_stmt (tsi);
> + for (first = tsi_stmt (tsi);
> + thr && !tsi_end_p (tsi) && !tree_could_throw_p (first);
> + first = tsi_stmt (tsi))
> + {
> + tsi_delink (&tsi);
> + append_to_statement_list (first, &blist);
> + }
> +
> tsi_delink (&tsi);
>
> + if (blist)
> + sra_insert_before (bsi, blist);
> +

This could use more comments as to what's going on.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-06-06 03:17:04 UTC
Permalink
On Jun 2, 2007, Bernd Schmidt <***@t-online.de> wrote:

> Alexandre Oliva wrote:
>> It took me a while, but I've managed to address Roman's request to
>> remove full-width BIT_FIELD_REFs (it required some fiddling with EH
>> annotations),

> Details? Comments in the code would be best.

As for comments, see below. It's just that we can't assume any longer
that the first instruction in a replacement sequence is the one
containing the memory access that may trap.

>> +/* Return true if a BIT_FIELD_REF<(FLD->parent), BLEN, BPOS>
>> + expression (refereced as BF below) accesses any of the bits in FLD,
>> + false if it doesn't. If DATA is non-null, the bit length of FLD
>> + will be stored in DATA[0] and the initial bit position of FLD will
>> + be stored in DATA[1], the number of bits of FLD that overlap with
>> + BF will be stored in DATA[2], and the initial bit of FLD that
>> + overlaps with BF will be stored in DATA[3]. */
>> +
>> +static bool
>> +bitfield_overlaps_p (tree blen, tree bpos, struct sra_elt *fld, tree *data)

> I'd rather have four separate, named variables. This is too opaque for
> the reader.

How about a single struct, as in the patch below?

>> +static void
>> +sra_explode_bitfield_assignment (tree var, tree vpos, bool to_var,
>> + tree *listp, tree blen, tree bpos,
>> + struct sra_elt *elt)

>> +static void
>> +sra_sync_for_bitfield_assignment (tree *listbeforep, tree *listafterp,
>> + tree blen, tree bpos,
>> + struct sra_elt *elt)

> No comments for these functions.

Ugh, sorry. I even remember having started a pass over the patch
adding missing comments for new functions, but I must have got off
track. Thanks for catching these.

>> /* Preserve EH semantics. */

> This could use more comments as to what's going on.

Is what I added to this revised patch enough?


Bootstrapped and regtested on x86_64-linux-gnu. Ok?

Even if it's approved, I probably won't check it in before Sunday,
since I'm going to be on the road until then, with good internet
access but little time for it :-(
Alexandre Oliva
2007-06-25 18:13:21 UTC
Permalink
On Jun 6, 2007, Alexandre Oliva <***@redhat.com> wrote:

> Bootstrapped and regtested on x86_64-linux-gnu. Ok?

> Even if it's approved, I probably won't check it in before Sunday,
> since I'm going to be on the road until then, with good internet
> access but little time for it :-(

> for gcc/ChangeLog
> from Alexandre Oliva <***@redhat.com>

> PR middle-end/22156
> * tree-sra.c (struct sra_elt): Add in_bitfld_block.
> (sra_hash_tree): Handle BIT_FIELD_REFs.
> (sra_elt_hash): Don't hash bitfld blocks.
> (sra_elt_eq): Skip them in parent compares as well. Handle
> BIT_FIELD_REFs.
> (build_element_name_1): Handle BIT_FIELD_REFs.
> (instantiate_element): Propagate nowarn from parents.
> Gimple-zero-initialize all bit-field variables that are not
> part of parameters that are going to be scalarized on entry.
> (instantiate_missing_elements_1): Return the sra_elt.
> (canon_type_for_field): New.
> (try_instantiate_multiple_fields): New. Infer widest possible
> access mode from decl or member type, but clip it at word
> size, and only widen it if a field crosses an alignment
> boundary.
> (instantiate_missing_elements): Use them.
> (generate_one_element_ref): Handle BIT_FIELD_REFs.
> (scalar_bitfield_p): New.
> (sra_build_assignment): Optimize assignments from scalarizable
> BIT_FIELD_REFs. Use BITS_BIG_ENDIAN to determine shift
> counts.
> (REPLDUP): New.
> (sra_build_bf_assignment): New. Optimize assignments to
> scalarizable BIT_FIELD_REFs.
> (sra_build_elt_assignment): New. Optimize BIT_FIELD_REF
> assignments to full variables.
> (generate_copy_inout): Use the new macros and functions.
> (generate_element_copy): Likewise. Handle bitfld differences.
> (generate_element_zero): Don't recurse for blocks. Use
> sra_build_elt_assignment.
> (generate_one_element_init): Take elt instead of var. Use
> sra_build_elt_assignment.
> (generate_element_init_1): Adjust.
> (bitfield_overlap_info): New struct.
> (bitfield_overlaps_p): New.
> (sra_explode_bitfield_assignment): New.
> (sra_sync_for_bitfield_assignment): New.
> (scalarize_use): Re-expand assignment to scalarized
> BIT_FIELD_REFs of gimple_regs. Explode or sync needed members
> for BIT_FIELD_REFs accesses or assignments. Use REPLDUP.
> (scalarize_copy): Use REPLDUP.
> (scalarize_ldst): Move assert before dereference. Adjust EH
> handling.
> (dump_sra_elt_name): Handle BIT_FIELD_REFs.

:ADDPATCH tree-sra:

Ping?

http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00300.html

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Bernd Schmidt
2007-06-26 22:10:58 UTC
Permalink
Alexandre Oliva wrote:
> Bootstrapped and regtested on x86_64-linux-gnu. Ok?

This fails as below. At this point I think I'd prefer if we could apply
the earlier stable patch first and then do enhancements on top of it.


Bernd

Native configuration is i686-pc-linux-gnu

=== gcc tests ===
@@ -31,6 +31,25 @@
Running
/local/src/egcs/trunk/gcc/testsuite/gcc.c-torture/compile/compile.exp ...
Running
/local/src/egcs/trunk/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp
...
Running
/local/src/egcs/trunk/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
+FAIL: gcc.c-torture/execute/20031211-1.c compilation, -O1 (internal
compiler error)
+FAIL: gcc.c-torture/execute/20031211-1.c compilation, -O2 (internal
compiler error)
+FAIL: gcc.c-torture/execute/20031211-1.c compilation, -O3
-fomit-frame-pointer (internal compiler error)
+FAIL: gcc.c-torture/execute/20031211-1.c compilation, -O3 -g
(internal compiler error)
+FAIL: gcc.c-torture/execute/20031211-1.c compilation, -Os (internal
compiler error)
+FAIL: gcc.c-torture/execute/20040709-1.c compilation, -O3
-fomit-frame-pointer (internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-1.c compilation, -O3
-fomit-frame-pointer -funroll-loops (internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-1.c compilation, -O3
-fomit-frame-pointer -funroll-all-loops -finline-functions (internal
compiler error)
+FAIL: gcc.c-torture/execute/20040709-1.c compilation, -O3 -g
(internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O1 (internal
compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O2 (internal
compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O3
-fomit-frame-pointer (internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O3
-fomit-frame-pointer -funroll-loops (internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O3
-fomit-frame-pointer -funroll-all-loops -finline-functions (internal
compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -O3 -g
(internal compiler error)
+FAIL: gcc.c-torture/execute/20040709-2.c compilation, -Os (internal
compiler error)
+FAIL: gcc.c-torture/execute/bf64-1.c compilation, -O3
-fomit-frame-pointer (internal compiler error)
+FAIL: gcc.c-torture/execute/bf64-1.c compilation, -O3 -g (internal
compiler error)
+FAIL: gcc.c-torture/execute/bf64-1.c compilation, -Os (internal
compiler error)

--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Alexandre Oliva
2007-06-28 04:23:57 UTC
Permalink
On Jun 26, 2007, Bernd Schmidt <***@t-online.de> wrote:

> This fails as below.

Thanks. I've fixed these and then some, then re-tested on both
i686-linux-gnu and x86_64-linux-gnu, no regressions.

> At this point I think I'd prefer if we could apply the earlier
> stable patch first and then do enhancements on top of it.

After some of the problems I caught in this last round, I'm no longer
comfortable installing the earlier "stable" patch. There were some
type compatibility and sign extension problems that hadn't been
exposed by testing but that were there, and I fear they might break
stuff.

Please consider this revised patch instead. The changes from the last
patch are:

- convert bit-field operands to bitsizetype before doing size_binop
arithmetic with them. This is what caused the ICEs you've found on
x86. Some other parts of the compiler create BIT_FIELD_REFs with
non-sizetype operands.

- use size_binop all over instead of int_const_binop, for stronger
type safety, except in the few places where we're building integral
constants rather than sizes

- in sra_build_assignment, use the correct type for the temp variable
that should hold the result of converting var to the same-sized
unsigned integral type (this was wrong in the "stable" patch)

- in sra_build_bf_assignment, use the correct type to decide whether
the incoming variable needs conversion to an unsigned integral type,
and convert it to the unsigned type instead of doing bit arithmetic
between operands with different signedness

- in sra_build_bf_assignment, filter-out sign-extension bits from
the input operand being replaced in a wider variable

- in sra_build_bf_assignment, make sure we convert the input field to
the correct unsigned type instead of relying on implicit type
conversion by the left shift.

I went through the type conversions, expressions and assignments in
the patch one more time, and I couldn't find any other mismatches.

Ok?
Roman Zippel
2007-07-03 00:46:35 UTC
Permalink
Hi,

On Thu, 28 Jun 2007, Alexandre Oliva wrote:

> On Jun 26, 2007, Bernd Schmidt <***@t-online.de> wrote:
>
> > This fails as below.
>
> Thanks. I've fixed these and then some, then re-tested on both
> i686-linux-gnu and x86_64-linux-gnu, no regressions.

I added this patch to my test run and it bootstraps on m68k, but there are
a few test failures, e.g. gcc.c-torture/execute/20031211-1.c is a simple one.

bye, Roman
Alexandre Oliva
2007-07-06 08:34:27 UTC
Permalink
On Jul 2, 2007, Roman Zippel <***@linux-m68k.org> wrote:

> Hi,
> On Thu, 28 Jun 2007, Alexandre Oliva wrote:

>> On Jun 26, 2007, Bernd Schmidt <***@t-online.de> wrote:
>>
>> > This fails as below.
>>
>> Thanks. I've fixed these and then some, then re-tested on both
>> i686-linux-gnu and x86_64-linux-gnu, no regressions.

> I added this patch to my test run and it bootstraps on m68k, but there are
> a few test failures, e.g. gcc.c-torture/execute/20031211-1.c is a simple one.

Thanks for the report, and for having tested the patch below and
confirming it fixed the problem. The patch lost sight of the actual
width of a bit-field and used its underlying type's width instead for
bit-field operations, so, in order to extract the single-bit bit-field
from the scalarized variable, it did a whole lot of right-shifting on
big-endian boxes. Oops.

I realized the lack of truncation could also be a problem for other
scalarized bit-field variables, so I arranged for accesses to these
variables to be handled as bit-fields as well. I'm not entirely happy
about this, but I don't see that it's safe to do otherwise.

Here's the patch, bootstrapped and regression-tested on
x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
tree, to be able to test ada as well). Tests on ppc-linux-gnu and
ppc64-linux-gnu underway.

Ok to install?
Alexandre Oliva
2007-08-24 04:24:10 UTC
Permalink
On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:

> Here's the patch, bootstrapped and regression-tested on
> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
> ppc64-linux-gnu underway.

> Ok to install?

Ping?

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00556.html

> for gcc/ChangeLog
> from Alexandre Oliva <***@redhat.com>

> PR middle-end/22156
> * tree-sra.c (struct sra_elt): Add in_bitfld_block.
> (sra_hash_tree): Handle BIT_FIELD_REFs.
> (sra_elt_hash): Don't hash bitfld blocks.
> (sra_elt_eq): Skip them in parent compares as well. Handle
> BIT_FIELD_REFs.
> (build_element_name_1): Handle BIT_FIELD_REFs.
> (instantiate_element): Propagate nowarn from parents. Create
> BIT_FIELD_REF for variables that are widened by scalarization.
> Gimple-zero-initialize all bit-field variables that are not
> part of parameters that are going to be scalarized on entry.
> (instantiate_missing_elements_1): Return the sra_elt.
> (canon_type_for_field): New.
> (try_instantiate_multiple_fields): New. Infer widest possible
> access mode from decl or member type, but clip it at word
> size, and only widen it if a field crosses an alignment
> boundary.
> (instantiate_missing_elements): Use them.
> (generate_one_element_ref): Handle BIT_FIELD_REFs.
> (scalar_bitfield_p): New.
> (sra_build_assignment): Optimize assignments from scalarizable
> BIT_FIELD_REFs. Use BITS_BIG_ENDIAN to determine shift
> counts.
> (REPLDUP): New.
> (sra_build_bf_assignment): New. Optimize assignments to
> scalarizable BIT_FIELD_REFs.
> (sra_build_elt_assignment): New. Optimize BIT_FIELD_REF
> assignments to full variables.
> (generate_copy_inout): Use the new macros and functions.
> (generate_element_copy): Likewise. Handle bitfld differences.
> (generate_element_zero): Don't recurse for blocks. Use
> sra_build_elt_assignment.
> (generate_one_element_init): Take elt instead of var. Use
> sra_build_elt_assignment.
> (generate_element_init_1): Adjust.
> (bitfield_overlap_info): New struct.
> (bitfield_overlaps_p): New.
> (sra_explode_bitfield_assignment): New. Adjust widened
> variables to account for endianness.
> (sra_sync_for_bitfield_assignment): New.
> (scalarize_use): Re-expand assignment to/from scalarized
> BIT_FIELD_REFs. Explode or sync needed members for
> BIT_FIELD_REFs accesses or assignments. Use REPLDUP.
> (scalarize_copy): Use REPLDUP.
> (scalarize_ldst): Move assert before dereference. Adjust EH
> handling.
> (dump_sra_elt_name): Handle BIT_FIELD_REFs.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-09-28 03:13:50 UTC
Permalink
On Aug 24, 2007, Alexandre Oliva <***@redhat.com> wrote:

> On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:
>> Here's the patch, bootstrapped and regression-tested on
>> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
>> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
>> ppc64-linux-gnu underway.

>> Ok to install?

> Ping?

Ping^2?

Tests on ppc* passed back then, BTW.

> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00556.html

>> for gcc/ChangeLog
>> from Alexandre Oliva <***@redhat.com>

>> PR middle-end/22156
>> * tree-sra.c (struct sra_elt): Add in_bitfld_block.
>> (sra_hash_tree): Handle BIT_FIELD_REFs.
>> (sra_elt_hash): Don't hash bitfld blocks.
>> (sra_elt_eq): Skip them in parent compares as well. Handle
>> BIT_FIELD_REFs.
>> (build_element_name_1): Handle BIT_FIELD_REFs.
>> (instantiate_element): Propagate nowarn from parents. Create
>> BIT_FIELD_REF for variables that are widened by scalarization.
>> Gimple-zero-initialize all bit-field variables that are not
>> part of parameters that are going to be scalarized on entry.
>> (instantiate_missing_elements_1): Return the sra_elt.
>> (canon_type_for_field): New.
>> (try_instantiate_multiple_fields): New. Infer widest possible
>> access mode from decl or member type, but clip it at word
>> size, and only widen it if a field crosses an alignment
>> boundary.
>> (instantiate_missing_elements): Use them.
>> (generate_one_element_ref): Handle BIT_FIELD_REFs.
>> (scalar_bitfield_p): New.
>> (sra_build_assignment): Optimize assignments from scalarizable
>> BIT_FIELD_REFs. Use BITS_BIG_ENDIAN to determine shift
>> counts.
>> (REPLDUP): New.
>> (sra_build_bf_assignment): New. Optimize assignments to
>> scalarizable BIT_FIELD_REFs.
>> (sra_build_elt_assignment): New. Optimize BIT_FIELD_REF
>> assignments to full variables.
>> (generate_copy_inout): Use the new macros and functions.
>> (generate_element_copy): Likewise. Handle bitfld differences.
>> (generate_element_zero): Don't recurse for blocks. Use
>> sra_build_elt_assignment.
>> (generate_one_element_init): Take elt instead of var. Use
>> sra_build_elt_assignment.
>> (generate_element_init_1): Adjust.
>> (bitfield_overlap_info): New struct.
>> (bitfield_overlaps_p): New.
>> (sra_explode_bitfield_assignment): New. Adjust widened
>> variables to account for endianness.
>> (sra_sync_for_bitfield_assignment): New.
>> (scalarize_use): Re-expand assignment to/from scalarized
>> BIT_FIELD_REFs. Explode or sync needed members for
>> BIT_FIELD_REFs accesses or assignments. Use REPLDUP.
>> (scalarize_copy): Use REPLDUP.
>> (scalarize_ldst): Move assert before dereference. Adjust EH
>> handling.
>> (dump_sra_elt_name): Handle BIT_FIELD_REFs.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Richard Guenther
2007-10-02 17:15:31 UTC
Permalink
On 9/28/07, Alexandre Oliva <***@redhat.com> wrote:
> On Aug 24, 2007, Alexandre Oliva <***@redhat.com> wrote:
>
> > On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:
> >> Here's the patch, bootstrapped and regression-tested on
> >> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
> >> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
> >> ppc64-linux-gnu underway.
>
> >> Ok to install?
>
> > Ping?
>
> Ping^2?
>
> Tests on ppc* passed back then, BTW.

It seems this was committed now, and causes ncurses build to fail on
x86_64:

../ncurses/./base/lib_addch.c: In function 'render_char':
../ncurses/./base/lib_addch.c:541: internal compiler error: in
bitfield_overlaps_p, at tree-sra.c:2901
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugs.opensuse.org/> for instructions.
make[1]: *** [../obj_s/lib_addch.o] Error 1

I'll file a bugreport once I get hands on preprocessed source.

Richard.
Richard Sandiford
2007-10-03 08:38:34 UTC
Permalink
"Richard Guenther" <***@gmail.com> writes:
> On 9/28/07, Alexandre Oliva <***@redhat.com> wrote:
>> On Aug 24, 2007, Alexandre Oliva <***@redhat.com> wrote:
>>
>> > On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:
>> >> Here's the patch, bootstrapped and regression-tested on
>> >> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
>> >> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
>> >> ppc64-linux-gnu underway.
>>
>> >> Ok to install?
>>
>> > Ping?
>>
>> Ping^2?
>>
>> Tests on ppc* passed back then, BTW.
>
> It seems this was committed now, and causes ncurses build to fail on
> x86_64:
>
> ../ncurses/./base/lib_addch.c: In function 'render_char':
> ../ncurses/./base/lib_addch.c:541: internal compiler error: in
> bitfield_overlaps_p, at tree-sra.c:2901
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://bugs.opensuse.org/> for instructions.
> make[1]: *** [../obj_s/lib_addch.o] Error 1
>
> I'll file a bugreport once I get hands on preprocessed source.

For the record, it also causes many failures on mips-linux-gnu,
such as the following execute.exp ones:

FAIL: gcc.c-torture/execute/20000113-1.c execution, -O1
FAIL: gcc.c-torture/execute/20000113-1.c execution, -O2
FAIL: gcc.c-torture/execute/20000113-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20000113-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20000113-1.c execution, -Os
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O1
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O2
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20031211-1.c execution, -Os
FAIL: gcc.c-torture/execute/20031211-2.c execution, -O1
FAIL: gcc.c-torture/execute/20031211-2.c execution, -O2
FAIL: gcc.c-torture/execute/20031211-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20031211-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20031211-2.c execution, -Os
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O1
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O2
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.c-torture/execute/20040307-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20040307-1.c execution, -Os
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O1
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O2
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.c-torture/execute/20040709-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20040709-1.c execution, -Os
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O1
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O2
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20040709-2.c execution, -Os
FAIL: gcc.c-torture/execute/920908-2.c execution, -O1
FAIL: gcc.c-torture/execute/920908-2.c execution, -O2
FAIL: gcc.c-torture/execute/920908-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/920908-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/920908-2.c execution, -Os
FAIL: gcc.c-torture/execute/bf64-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/bf64-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/bitfld-1.c execution, -O1
FAIL: gcc.c-torture/execute/bitfld-1.c execution, -O2
FAIL: gcc.c-torture/execute/bitfld-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/bitfld-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/bitfld-1.c execution, -Os
FAIL: gcc.c-torture/execute/compndlit-1.c execution, -O1
FAIL: gcc.c-torture/execute/compndlit-1.c execution, -O2
FAIL: gcc.c-torture/execute/compndlit-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/compndlit-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/compndlit-1.c execution, -Os

FWIW, the gcc.c-torture/execute/20000113-1.c failures can easily be
reproduced with a cross-compiler, because after the patch, foobar is
optimised to a call to abort.

Richard
Alexandre Oliva
2007-10-03 16:49:34 UTC
Permalink
On Oct 3, 2007, Richard Sandiford <***@nildram.co.uk> wrote:

> For the record, it also causes many failures on mips-linux-gnu,
> such as the following execute.exp ones:

> FWIW, the gcc.c-torture/execute/20000113-1.c failures can easily be
> reproduced with a cross-compiler, because after the patch, foobar is
> optimised to a call to abort.

This patch appears to fix at least the first one. I haven't given it
a full round of testing yet, FWIW.
Richard Sandiford
2007-10-03 18:22:05 UTC
Permalink
Alexandre Oliva <***@redhat.com> writes:
> On Oct 3, 2007, Richard Sandiford <***@nildram.co.uk> wrote:
>> For the record, it also causes many failures on mips-linux-gnu,
>> such as the following execute.exp ones:
>
>> FWIW, the gcc.c-torture/execute/20000113-1.c failures can easily be
>> reproduced with a cross-compiler, because after the patch, foobar is
>> optimised to a call to abort.
>
> This patch appears to fix at least the first one. I haven't given it
> a full round of testing yet, FWIW.

Thanks for the quick fix. I'll give it an overnight spin on
mipsisa32-elf (which also shows the problem) and let you know
how it goes.

Richard
Richard Sandiford
2007-10-04 19:56:30 UTC
Permalink
Alexandre Oliva <***@redhat.com> writes:
> On Oct 3, 2007, Richard Sandiford <***@nildram.co.uk> wrote:
>> For the record, it also causes many failures on mips-linux-gnu,
>> such as the following execute.exp ones:
>
>> FWIW, the gcc.c-torture/execute/20000113-1.c failures can easily be
>> reproduced with a cross-compiler, because after the patch, foobar is
>> optimised to a call to abort.
>
> This patch appears to fix at least the first one. I haven't given it
> a full round of testing yet, FWIW.

Thanks, it stops the testcase being optimised to abort, but it does
still abort. The following line of final_cleanup dump looks suspicious:

b.x3 = (<unnamed-unsigned:3>) ((unsigned char) ((D.1242 - (int) D.1234) * D.1242) + (unsigned char) D.1238) | D.1238 & 7;

Why the "|"? (The bit on the left hand side of the "|" looks fine.)

FWIW, if you need an executable testcase, the same problem occurs on
mipsisa32-elf. I'm happy to test any revised patches though.

Richard
Alexandre Oliva
2007-10-05 17:42:47 UTC
Permalink
On Oct 4, 2007, Richard Sandiford <***@nildram.co.uk> wrote:

> Alexandre Oliva <***@redhat.com> writes:
>> On Oct 3, 2007, Richard Sandiford <***@nildram.co.uk> wrote:
>>> For the record, it also causes many failures on mips-linux-gnu,
>>> such as the following execute.exp ones:
>>
>>> FWIW, the gcc.c-torture/execute/20000113-1.c failures can easily be
>>> reproduced with a cross-compiler, because after the patch, foobar is
>>> optimised to a call to abort.
>>
>> This patch appears to fix at least the first one. I haven't given it
>> a full round of testing yet, FWIW.

> Thanks, it stops the testcase being optimised to abort, but it does
> still abort. The following line of final_cleanup dump looks suspicious:

> b.x3 = (<unnamed-unsigned:3>) ((unsigned char) ((D.1242 - (int) D.1234) * D.1242) + (unsigned char) D.1238) | D.1238 & 7;

> Why the "|"?

Another case of overflow and sign-extension when computing bit masks.
I haven't tried to figured out why it doesn't hit on other
architectures.

Anyhow, this patch, that supersedes the previous, fixes it, and with
it the testcase appears to yield the correct result.

> FWIW, if you need an executable testcase, the same problem occurs on
> mipsisa32-elf. I'm happy to test any revised patches though.

I'm bootstrapping on x86_64-linux-gnu right now. I don't think there
will be time for me to start a cross toolchain build after that,
before I leave on another short trip during the weekend, so I'd
appreciate if you could give it a spin.

If this isn't enough, on Monday, or perhaps even Sunday evening, I'll
get myself a cross mipsisa32-elf toolchain and test environment and
look into all the remaining failures you reported.

Hopefully this patch will fix the hppa-linux-gnu Ada compiler
miscompilation too. Dave, Eric, would you please give it a try and
let me know? I don't have access to this platform, so it would be
difficult for me to diagnose the compiler miscompilation :-(

Perhaps running a bootstrap-disabled build, then letting me know about
regressions it displays? Then I can try to duplicate them locally and
hopefully they will lead to a fix for the bootstrap problem.

Thanks,
Richard Sandiford
2007-10-06 08:01:51 UTC
Permalink
Alexandre Oliva <***@redhat.com> writes:
> Anyhow, this patch, that supersedes the previous, fixes it, and with
> it the testcase appears to yield the correct result.

Thanks, the results after this patch are much better. As you say,
20000113-1.c is well and truly fixed. The only remaining C regressions are:

FAIL: gcc.c-torture/execute/20031211-1.c execution, -O1
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O2
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20031211-1.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20031211-1.c execution, -Os
FAIL: gcc.c-torture/execute/20040709-1.c execution, -Os
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O1
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O2
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.c-torture/execute/20040709-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/20040709-2.c execution, -Os
FAIL: gcc.c-torture/execute/920908-2.c execution, -O1
FAIL: gcc.c-torture/execute/920908-2.c execution, -O2
FAIL: gcc.c-torture/execute/920908-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/920908-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/920908-2.c execution, -Os
FAIL: gcc.c-torture/execute/950628-1.c execution, -Os
FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -fomit-frame-pointer (internal compiler error)
FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -g (internal compiler error)
FAIL: gcc.c-torture/execute/bf64-1.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/bf64-1.c execution, -O3 -g

20031211-1.c is a pleasingly simple testcase, and as with the original
20000113-1.c failure, it's optimised to abort() by final_cleanup.

Don't worry about the 991118-1.c failure. It's an RTL ICE, and probably
a config/mips bug. I'll have a look at it.

Richard
John David Anglin
2007-10-06 21:05:38 UTC
Permalink
> Alexandre Oliva <***@redhat.com> writes:
> > Anyhow, this patch, that supersedes the previous, fixes it, and with
> > it the testcase appears to yield the correct result.
>
> Thanks, the results after this patch are much better. As you say,
> 20000113-1.c is well and truly fixed. The only remaining C regressions are:

> FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -fomit-frame-pointer (internal compiler error)
> FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -g (internal compiler error)

> Don't worry about the 991118-1.c failure. It's an RTL ICE, and probably
> a config/mips bug. I'll have a look at it.

I'm also still seeing the failure of 991118-1.c. These are the remaining
new fails on hppa-unknown-linux-gnu:

FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -fomit-frame-pointer (
internal compiler error)
FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -g (internal compiler
error)
FAIL: gcc.c-torture/execute/longlong.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/longlong.c execution, -O3 -g

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
Richard Sandiford
2007-10-06 22:11:58 UTC
Permalink
"John David Anglin" <***@hiauly1.hia.nrc.ca> writes:
>> Alexandre Oliva <***@redhat.com> writes:
>> > Anyhow, this patch, that supersedes the previous, fixes it, and with
>> > it the testcase appears to yield the correct result.
>>
>> Thanks, the results after this patch are much better. As you say,
>> 20000113-1.c is well and truly fixed. The only remaining C regressions are:
>
>> FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -fomit-frame-pointer (internal compiler error)
>> FAIL: gcc.c-torture/execute/991118-1.c compilation, -O3 -g (internal compiler error)
>
>> Don't worry about the 991118-1.c failure. It's an RTL ICE, and probably
>> a config/mips bug. I'll have a look at it.
>
> I'm also still seeing the failure of 991118-1.c.

Ah, thanks. I did have a quick look at it, and it seemed to be caused
by a missing conversion at the tree level. Expand was being given
a BIT_AND_EXPR with a DImode-typed result and HImode-typed operand.
I didn't try to track down where it was coming from. So it does seem
to be a tree-level bug after all.

Richard
Alexandre Oliva
2007-10-07 23:43:17 UTC
Permalink
On Oct 6, 2007, Richard Sandiford <***@nildram.co.uk> wrote:

> Ah, thanks. I did have a quick look at it, and it seemed to be caused
> by a missing conversion at the tree level. Expand was being given
> a BIT_AND_EXPR with a DImode-typed result and HImode-typed operand.
> I didn't try to track down where it was coming from. So it does seem
> to be a tree-level bug after all.

Yep, fixed in the revised patch below.

It also fixes the endianness problem that was hitting machines that
had BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN, that caused the 20031211-1.c
and other regressions. I remember having noticed at some point in the
development of the bit-field SRA patch that I was using the wrong
endianness macro and changing it, but BITS_BIG_ENDIAN wasn't what I
was looking for. Doh.

I haven't given this patch much testing yet, but the two new changes
are obviously correct and do make progress. I'm starting a test run
on a just-built mipsisa32-elf uberbaum tree, after having finally
figured out why dejagnu wasn't finding newlib (my uberbaum source tree
contained links to gcc and srcware sub-dirs, but site.exp set srcdir
with the pathname of the gcc source tree, rather than that of the
uberbaum tree, doesn't this bite anyone else?)

Ok to install if it passes bootstrap on x86_64-linux-gnu?
John David Anglin
2007-10-08 21:12:42 UTC
Permalink
On Sun, 07 Oct 2007, Alexandre Oliva wrote:

> On Oct 6, 2007, Richard Sandiford <***@nildram.co.uk> wrote:
>
> > Ah, thanks. I did have a quick look at it, and it seemed to be caused
> > by a missing conversion at the tree level. Expand was being given
> > a BIT_AND_EXPR with a DImode-typed result and HImode-typed operand.
> > I didn't try to track down where it was coming from. So it does seem
> > to be a tree-level bug after all.
>
> Yep, fixed in the revised patch below.

The 991118-1.c and longlong.c fails are still present on hppa-unknown-linux-gnu
with the revised patch.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
Alexandre Oliva
2007-10-08 23:50:27 UTC
Permalink
On Oct 8, 2007, John David Anglin <***@hiauly1.hia.nrc.ca> wrote:

> On Sun, 07 Oct 2007, Alexandre Oliva wrote:
>> On Oct 6, 2007, Richard Sandiford <***@nildram.co.uk> wrote:
>>
>> > Ah, thanks. I did have a quick look at it, and it seemed to be caused
>> > by a missing conversion at the tree level. Expand was being given
>> > a BIT_AND_EXPR with a DImode-typed result and HImode-typed operand.
>> > I didn't try to track down where it was coming from. So it does seem
>> > to be a tree-level bug after all.
>>
>> Yep, fixed in the revised patch below.

> The 991118-1.c and longlong.c fails are still present on
> hppa-unknown-linux-gnu with the revised patch.

Since there isn't AFAIK a simulator for hppa-elf and the failure isn't
present on mipsisa32-elf, I'm going to need some help tracking this
down. I've stared at the final tree code, the final RTL and the
generated assembly for 991118-1.c on hppa-linux-gnu for a couple of
hours already, and I haven't been able to locate any error :-(

Could you please let me know which comparison fails, what we're
comparing with and what the computed value is, such that I can stand a
better chance of fixing the problem?


Thanks,

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
John David Anglin
2007-10-09 00:27:45 UTC
Permalink
> Could you please let me know which comparison fails, what we're
> comparing with and what the computed value is, such that I can stand a
> better chance of fixing the problem?

/home/dave/gnu/gcc-4.3/gcc/gcc/testsuite/gcc.c-torture/execute/991118-1.c: In function 'main':
/home/dave/gnu/gcc-4.3/gcc/gcc/testsuite/gcc.c-torture/execute/991118-1.c:64: internal compiler error: in simplify_subreg, at simplify-rtx.c:4921

Looks like:
tmp = sub (tmp);

Breakpoint 2, simplify_subreg (outermode=SImode, op=0x402e6fd0,
innermode=DImode, byte=0) at ../../gcc/gcc/simplify-rtx.c:4920
4920 gcc_assert (GET_MODE (op) == innermode
(gdb) p debug_rtx (op)
(subreg:HI (reg:SI 291) 2)
(gdb) bt
#0 simplify_subreg (outermode=SImode, op=0x402e6fd0, innermode=DImode, byte=0)
at ../../gcc/gcc/simplify-rtx.c:4920
#1 0x00305414 in simplify_gen_subreg (outermode=SImode, op=0x402e6fd0,
innermode=DImode, byte=0) at ../../gcc/gcc/simplify-rtx.c:5218
#2 0x00167ff4 in operand_subword_force (op=0x7, offset=1076785104,
mode=DImode) at ../../gcc/gcc/emit-rtl.c:1438
#3 0x0027a1cc in expand_binop (mode=DImode, binoptab=0x75db40,
op0=0x402e6fd0, op1=0x401563d8, target=0x402e80f0, unsignedp=1,
methods=OPTAB_LIB_WIDEN) at ../../gcc/gcc/optabs.c:1721
#4 0x00189718 in expand_expr_real_1 (exp=0x402cbe38, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:9377
#5 0x001997e0 in expand_expr_real (exp=0x402cbe38, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:7090
#6 0x001999b0 in expand_expr (exp=0x7, target=0x402e6fd0, mode=DImode,
modifier=EXPAND_NORMAL) at ../../gcc/gcc/expr.h:514
#7 0x0018de08 in expand_expr_real_1 (exp=0x402df460, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:8109
#8 0x001997e0 in expand_expr_real (exp=0x402df460, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:7090
#9 0x001999b0 in expand_expr (exp=0x7, target=0x402e6fd0, mode=DImode,
modifier=EXPAND_NORMAL) at ../../gcc/gcc/expr.h:514
#10 0x0018e530 in expand_expr_real_1 (exp=0x402cbe60, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:8914
#11 0x001997e0 in expand_expr_real (exp=0x402cbe60, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:7090
#12 0x00199c7c in expand_operands (exp0=0x402cbd20, exp1=0x402cbe60,
target=0x0, op0=0xfb472288, op1=0xfb47228c, modifier=EXPAND_NORMAL)
at ../../gcc/gcc/expr.h:514
#13 0x001896d4 in expand_expr_real_1 (exp=0x402cbe88, target=0x0,
tmode=DImode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:9370
#14 0x001997e0 in expand_expr_real (exp=0x402cbe88, target=0x0, tmode=DImode,
modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc/gcc/expr.c:7090
#15 0x001999b0 in expand_expr (exp=0x7, target=0x402e6fd0, mode=DImode,
modifier=EXPAND_NORMAL) at ../../gcc/gcc/expr.h:514
#16 0x0018a1fc in expand_expr_real_1 (exp=0x402d18c0, target=0x402e3080,
tmode=DImode, modifier=EXPAND_NORMAL, alt_rtl=0xfb471fc8)
at ../../gcc/gcc/expr.c:8159
#17 0x001997e0 in expand_expr_real (exp=0x402d18c0, target=0x402e3080,
tmode=DImode, modifier=EXPAND_NORMAL, alt_rtl=0xfb471fc8)
at ../../gcc/gcc/expr.c:7090
#18 0x0019ee20 in store_expr (exp=0x7, target=0x402e3080, call_param_p=0,
nontemporal=<value optimized out>) at ../../gcc/gcc/expr.c:4574
#19 0x001a1590 in expand_assignment (to=0x40152a80, from=0x402d18c0,
nontemporal=0 '\0') at ../../gcc/gcc/expr.c:4357
#20 0x00187c28 in expand_expr_real_1 (exp=0x402d18e0, target=0x0,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:9130
#21 0x00199950 in expand_expr_real (exp=0x402d18e0, target=0x400d9200,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0)
at ../../gcc/gcc/expr.c:7084
#22 0x003069d4 in expand_expr_stmt (exp=0x7) at ../../gcc/gcc/stmt.c:1357
#23 0x00549bfc in expand_gimple_basic_block (bb=0x400d0800)
at ../../gcc/gcc/cfgexpand.c:1606
#24 0x0054adb4 in tree_expand_cfg () at ../../gcc/gcc/cfgexpand.c:1913
#25 0x00288e3c in execute_one_pass (pass=0x6e9388)
at ../../gcc/gcc/passes.c:1117
#26 0x00289050 in execute_pass_list (pass=0x6e9388)
at ../../gcc/gcc/passes.c:1170
#27 0x003798b8 in tree_rest_of_compilation (fndecl=0x40148e80)
at ../../gcc/gcc/tree-optimize.c:404
#28 0x004e0cc4 in cgraph_expand_function (node=0x40148f00)
at ../../gcc/gcc/cgraphunit.c:1060
#29 0x004e3534 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1123
#30 0x00031b14 in c_write_global_declarations () at ../../gcc/gcc/c-decl.c:8077
#31 0x0031ae3c in toplev_main (argc=<value optimized out>,
argv=<value optimized out>) at ../../gcc/gcc/toplev.c:1052
#32 0x407db750 in __libc_start_main () from /lib/libc.so.6
#33 0x0001ea58 in _start ()


--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
David Daney
2007-10-06 16:00:55 UTC
Permalink
Alexandre Oliva wrote:
> I'm bootstrapping on x86_64-linux-gnu right now. I don't think there
> will be time for me to start a cross toolchain build after that,
> before I leave on another short trip during the weekend, so I'd
> appreciate if you could give it a spin.
>
> If this isn't enough, on Monday, or perhaps even Sunday evening, I'll
> get myself a cross mipsisa32-elf toolchain and test environment and
> look into all the remaining failures you reported.
>
> Hopefully this patch will fix the hppa-linux-gnu Ada compiler
> miscompilation too. Dave, Eric, would you please give it a try and
> let me know? I don't have access to this platform, so it would be
> difficult for me to diagnose the compiler miscompilation :-(
>
> Perhaps running a bootstrap-disabled build, then letting me know about
> regressions it displays? Then I can try to duplicate them locally and
> hopefully they will lead to a fix for the bootstrap problem.
>
> Thanks,
>
>

I bootstrapped with --enable-languages=c on mipsel-linux with

r129029 with your two patches
(gcc-sra-bit-field-ref-fallout-array.patch,
gcc-sra-bit-field-ref-fallout-mips.patch). This fixed all the C
regressions.

David Daney
Daniel Berlin
2007-10-03 14:36:56 UTC
Permalink
On 10/2/07, Richard Guenther <***@gmail.com> wrote:
> On 9/28/07, Alexandre Oliva <***@redhat.com> wrote:
> > On Aug 24, 2007, Alexandre Oliva <***@redhat.com> wrote:
> >
> > > On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:
> > >> Here's the patch, bootstrapped and regression-tested on
> > >> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
> > >> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
> > >> ppc64-linux-gnu underway.
> >
> > >> Ok to install?
> >
> > > Ping?
> >
> > Ping^2?
> >
> > Tests on ppc* passed back then, BTW.
>
> It seems this was committed now, and causes ncurses build to fail on
> x86_64:
>
Was it actually approved anywhere?
If not, why was it committed?
Diego Novillo
2007-10-03 14:44:35 UTC
Permalink
On 10/3/07, Daniel Berlin <***@dberlin.org> wrote:

> Was it actually approved anywhere?
> If not, why was it committed?
>

[ Sending from my work account. gcc.gnu.org is rejecting mail from my
personal account. ]

Yes, I approved it earlier this week. Alex, if it's causing
widespread breakage again, please back it out. Perhaps this needs
more work, or the patch needs to be redesigned. It is quite
intrusive, after all.
Richard Guenther
2007-10-05 15:03:25 UTC
Permalink
On 10/2/07, Richard Guenther <***@gmail.com> wrote:
> On 9/28/07, Alexandre Oliva <***@redhat.com> wrote:
> > On Aug 24, 2007, Alexandre Oliva <***@redhat.com> wrote:
> >
> > > On Jul 6, 2007, Alexandre Oliva <***@redhat.com> wrote:
> > >> Here's the patch, bootstrapped and regression-tested on
> > >> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
> > >> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
> > >> ppc64-linux-gnu underway.
> >
> > >> Ok to install?
> >
> > > Ping?
> >
> > Ping^2?
> >
> > Tests on ppc* passed back then, BTW.
>
> It seems this was committed now, and causes ncurses build to fail on
> x86_64:
>
> ../ncurses/./base/lib_addch.c: In function 'render_char':
> ../ncurses/./base/lib_addch.c:541: internal compiler error: in
> bitfield_overlaps_p, at tree-sra.c:2901
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://bugs.opensuse.org/> for instructions.
> make[1]: *** [../obj_s/lib_addch.o] Error 1
>
> I'll file a bugreport once I get hands on preprocessed source.

Btw, this is PR33655.

Richard.
Alexandre Oliva
2007-10-05 16:19:39 UTC
Permalink
On Oct 5, 2007, "Richard Guenther" <***@gmail.com> wrote:

> On 10/2/07, Richard Guenther <***@gmail.com> wrote:
>> It seems this was committed now, and causes ncurses build to fail on
>> x86_64:
> Btw, this is PR33655.

Thanks. Here's a patch that fixes the problem, that I've just started
bootstrapping and regtesting on x86_64-linux-gnu. Ok to install if it
passes?
Richard Guenther
2007-10-05 16:24:19 UTC
Permalink
On 10/5/07, Alexandre Oliva <***@redhat.com> wrote:
> > On 10/2/07, Richard Guenther <***@gmail.com> wrote:
> >> It seems this was committed now, and causes ncurses build to fail on
> >> x86_64:
> > Btw, this is PR33655.
>
> Thanks. Here's a patch that fixes the problem, that I've just started
> bootstrapping and regtesting on x86_64-linux-gnu. Ok to install if it
> passes?

Yes, thanks.

Richard.
Diego Novillo
2007-10-05 16:25:59 UTC
Permalink
On 10/5/07, Alexandre Oliva <***@redhat.com> wrote:

> Ok to install if it passes?

Yes, with a ChangeLog entry and the test from the PR.
Alexandre Oliva
2007-10-05 20:08:00 UTC
Permalink
On Oct 5, 2007, "Diego Novillo" <***@google.com> wrote:

> On 10/5/07, Alexandre Oliva <***@redhat.com> wrote:
>> Ok to install if it passes?

> Yes, with a ChangeLog entry and the test from the PR.

Err, I'm confused. AFAICT the patch I posted had both. Did you find
something else was missing?

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Eric Botcazou
2007-10-03 07:47:34 UTC
Permalink
> Ping^2?
>
> Tests on ppc* passed back then, BTW.

It apparently still misbehaves in some cases for Ada and has introduced an
ACATS regression at -O2 on x86-64 (c37213b):

<bb 4>:
# D.857_26 = PHI <D.857_11(3), D.857_27(7)>
D.860_25 = (<unnamed-signed:64>) D.857_26;
_init = VIEW_CONVERT_EXPR<struct p__cons[D.859:D.861]>(*x.1_3)[D.860_25]{lb:
D.859_24 sz: 12};
SR.57_139 = VIEW_CONVERT_EXPR<UNSIGNED_64>(_init.c1);
_init.c1 = VIEW_CONVERT_EXPR<struct p__cons__T2b>(SR.57_139);
D.976 = VIEW_CONVERT_EXPR<struct p__rec>(4294967297);
SR.45_155 = VIEW_CONVERT_EXPR<UNSIGNED_64>(D.976);
D.978.c1 = VIEW_CONVERT_EXPR<struct p__cons__T2b>(SR.45_155);
D.978.d3 = 1;
D.960 = D.978;
VIEW_CONVERT_EXPR<struct p__cons[D.859:D.861]>(*x.1_3)[D.860_25]{lb:
D.859_24 sz: 12} = D.960;

<bb 5>:
if (D.858_23 == D.857_26)
goto <bb 6>;
else
goto <bb 7>;

is now turned into

<bb 4>:
# SR.128_80 = PHI <SR.128_164(3), SR.128_197(7)>
# D.857_26 = PHI <D.857_11(3), D.857_27(7)>
D.860_25 = (<unnamed-signed:64>) D.857_26;
_init$d3_69 = VIEW_CONVERT_EXPR<struct p__cons[D.859:D.861]>(*x.1_3)
[D.860_25]{lb: D.859_24 sz: 12}.d3;
SR.131_67 = VIEW_CONVERT_EXPR<UNSIGNED_64>(VIEW_CONVERT_EXPR<struct
p__cons[D.859:D.861]>(*x.1_3)[D.860_25]{lb: D.859_24 sz: 12}.c1);
_init$c1$B0F64_65 = SR.131_67;
SR.132_63 = _init$c1$B0F64_65;
_init.c1 = VIEW_CONVERT_EXPR<struct p__cons__T2b>(SR.132_63);
SR.57_139 = VIEW_CONVERT_EXPR<UNSIGNED_64>(_init.c1);
SR.133_6 = SR.57_139;
_init$c1$B0F64_154 = SR.133_6;
D.976 = VIEW_CONVERT_EXPR<struct p__rec>(4294967297);
SR.45_155 = VIEW_CONVERT_EXPR<UNSIGNED_64>(D.976);
SR.134_142 = SR.45_155;
SR.130_124 = SR.134_142;
SR.129_148 = 1;
SR.135_147 = SR.128_80 & 4294967295;
SR.138_83 = SR.130_124 >> 32;
SR.136_141 = (p__sm___XDLU_1__10) SR.138_83;
SR.139_140 = (UNSIGNED_64) SR.136_141;
SR.140_138 = SR.139_140 << 32;
SR.128_92 = SR.135_147 | SR.140_138;
SR.141_201 = SR.128_92 & 0x0ffffffff00000000;
SR.142_200 = (p__sm___XDLU_1__10) SR.130_124;
SR.144_199 = (UNSIGNED_64) SR.142_200;
SR.128_137 = SR.141_201 | SR.144_199;
SR.128_197 = SR.130_124;
SR.127_136 = SR.129_148;
SR.145_135 = SR.128_197;

<bb 5>:
VIEW_CONVERT_EXPR<struct p__cons[D.859:D.861]>(*x.1_3)[D.860_25]{lb:
D.859_24 sz: 12}.c1 = VIEW_CONVERT_EXPR<struct p__cons__T2b>(SR.145_135);
VIEW_CONVERT_EXPR<struct p__cons[D.859:D.861]>(*x.1_3)[D.860_25]{lb:
D.859_24 sz: 12}.d3 = SR.127_136;
if (D.858_23 == D.857_26)
goto <bb 6>;
else
goto <bb 7>;


The correctness problem is that the last statement of BB 4 is not
tree_could_throw_p anymore, yielding:

p.adb: In function 'P':
p.adb:1: error: statement marked for throw, but doesn't
SR.145_135 = SR.128_197;

The optimization problem is the junk now present at the end of BB 4.

Testcase attached, compile at -O2.

--
Eric Botcazou
Eric Botcazou
2007-10-03 21:35:59 UTC
Permalink
> It apparently still misbehaves in some cases for Ada

It miscompiles the Ada compiler on PA/Linux (PR ada/33634).

--
Eric Botcazou
Eric Botcazou
2007-10-05 06:26:23 UTC
Permalink
> Ugh, there was a thinko in the test for whether the original BB/stmt
> could throw. Thanks for the testcase, this patch fixes it.

Partially, I presume it doesn't do anything for the pessimization.

--
Eric Botcazou
Alexandre Oliva
2007-10-05 16:02:21 UTC
Permalink
On Oct 5, 2007, Eric Botcazou <***@adacore.com> wrote:

>> Ugh, there was a thinko in the test for whether the original BB/stmt
>> could throw. Thanks for the testcase, this patch fixes it.

> Partially, I presume it doesn't do anything for the pessimization.

What pessimization?

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Eric Botcazou
2007-10-07 09:03:33 UTC
Permalink
> What pessimization?

The complex calculation now introduced by SRA:

SR.135_147 = SR.128_80 & 4294967295;
SR.138_83 = SR.130_124 >> 32;
SR.136_141 = (p__sm___XDLU_1__10) SR.138_83;
SR.139_140 = (UNSIGNED_64) SR.136_141;
SR.140_138 = SR.139_140 << 32;
SR.128_92 = SR.135_147 | SR.140_138;
SR.141_201 = SR.128_92 & 0x0ffffffff00000000;
SR.142_200 = (p__sm___XDLU_1__10) SR.130_124;
SR.144_199 = (UNSIGNED_64) SR.142_200;
SR.128_137 = SR.141_201 | SR.144_199;

It turns out that SR.128_137 is dead so the whole stuff is eliminated by the
first subsequent DCE pass, but I'm worried that it might not always be so.

I confirm that the verification failure of c37213b is fixed by the patch you
posted in your previous message, thanks!

--
Eric Botcazou
Alexandre Oliva
2007-10-07 23:56:44 UTC
Permalink
On Oct 7, 2007, Eric Botcazou <***@adacore.com> wrote:

>> What pessimization?
> The complex calculation now introduced by SRA:

It's not really introduced, it's just made explicit earlier, in the
tree level, such that we can better optimize it. I'll give you that
this isn't the simplest way to model operations on 32-bit words in
64-bit variables, but in the end we'd get the same code either way, at
least for variables that are not pushed to memory.

> I confirm that the verification failure of c37213b is fixed by the patch you
> posted in your previous message, thanks!

Thanks,

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Eric Botcazou
2007-10-08 05:14:46 UTC
Permalink
> It's not really introduced, it's just made explicit earlier, in the
> tree level, such that we can better optimize it. I'll give you that
> this isn't the simplest way to model operations on 32-bit words in
> 64-bit variables, but in the end we'd get the same code either way, at
> least for variables that are not pushed to memory.

OK, thanks for the clarification.

Could you install the aforementioned patchlet? FWIW it has successfully
passed a complete bootstrap + Ada regression testing on x86-64.

--
Eric Botcazou
Alexandre Oliva
2007-10-08 20:29:25 UTC
Permalink
On Oct 8, 2007, Eric Botcazou <***@adacore.com> wrote:

>> It's not really introduced, it's just made explicit earlier, in the
>> tree level, such that we can better optimize it. I'll give you that
>> this isn't the simplest way to model operations on 32-bit words in
>> 64-bit variables, but in the end we'd get the same code either way, at
>> least for variables that are not pushed to memory.

> OK, thanks for the clarification.

> Could you install the aforementioned patchlet? FWIW it has successfully
> passed a complete bootstrap + Ada regression testing on x86-64.

I'm not sure which of these two patches you're talking about, and I'm
not entitled to install any of them without approval, unless you think
they fall in the obviously-correct rule (which the first one might, I
guess, but AFAICT it somehow failed to make to the list)

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00440.html
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00395.html

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Eric Botcazou
2007-10-08 21:02:19 UTC
Permalink
> I'm not sure which of these two patches you're talking about, and I'm
> not entitled to install any of them without approval, unless you think
> they fall in the obviously-correct rule (which the first one might, I
> guess, but AFAICT it somehow failed to make to the list)

Yes, it's the first one and yes, I'd think that it falls into the
obviously-correct-for-the-sake-of-consistency rule.

--
Eric Botcazou
Alexandre Oliva
2007-10-08 23:56:20 UTC
Permalink
On Oct 8, 2007, Eric Botcazou <***@adacore.com> wrote:

>> I'm not sure which of these two patches you're talking about, and I'm
>> not entitled to install any of them without approval, unless you think
>> they fall in the obviously-correct rule (which the first one might, I
>> guess, but AFAICT it somehow failed to make to the list)

> Yes, it's the first one and yes, I'd think that it falls into the
> obviously-correct-for-the-sake-of-consistency rule.

Ok, then, it's in.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Alexandre Oliva
2007-10-04 18:36:24 UTC
Permalink
On Oct 3, 2007, Eric Botcazou <***@adacore.com> wrote:

> It apparently still misbehaves in some cases for Ada and has introduced an
> ACATS regression at -O2 on x86-64 (c37213b):

> The correctness problem is that the last statement of BB 4 is not
> tree_could_throw_p anymore, yielding:

Ugh, there was a thinko in the test for whether the original BB/stmt
could throw. Thanks for the testcase, this patch fixes it. I haven't
bootstrapped it yet, I'm on the road with a little computing
horsepower and limited network connection. I should be back home
tonight.
Diego Novillo
2007-09-29 17:36:37 UTC
Permalink
On 7/6/07, Alexandre Oliva <***@redhat.com> wrote:

> Here's the patch, bootstrapped and regression-tested on
> x86_64-linux-gnu and i686-pc-linux-gnu (with an oldish, pre-sccvn
> tree, to be able to test ada as well). Tests on ppc-linux-gnu and
> ppc64-linux-gnu underway.
>
> Ok to install?

Yes. Thanks.
Daniel Berlin
2007-04-20 17:06:35 UTC
Permalink
On 4/19/07, Bernd Schmidt <***@t-online.de> wrote:
> Daniel Berlin wrote:
> > On 4/19/07, Bernd Schmidt <***@t-online.de> wrote: }
> >>
> >> What strikes me as odd here is the following piece of code:
> >> y.k = y$k_9(D);
> >> y = sD;
> >> The (D) apparently stands for "default definition" which I take to mean
> >> it's considered uninitialized.
> >
> > Yes
> >> I don't know why SRA thinks it needs to
> >> insert this statement, as all of y is overwritten immediately after.
> > SRA does not eliminate dead stores, the dead store elimination does.
> >
> > SRA also does not try to discover which pieces of the structure are
> > actually used in the function prior to performing SRA, so it will
> > generate uninitialized uses when it does element by element copies.
> >
> > This has been the way SRA has worked forever. I noticed it when I
> > wrote create_structure_field_vars, since after SRA, we end up with
> > "used" fields that have no structure field vars.
> > I mentioned this, and was told it was going to be too expensive to
> > track which pieces are used. After trying to do the same thing in
> > create_structure_vars, I agree.
> >
> > It would also be a mistake to try to do significant optimization of
> > copy in/out placement directly in SRA. Again, that is why we have PRE
> > and DSE. If they are not moving/eliminating these things, they need
> > to be fixed.
>
> I'm not convinced it can't be done with changes to SRA - this behaviour
> does seem to have been introduced by a recent patch by aoliva.

So, looking again at your testcase, it looks like you should be right.
We definitely don't need to *copy into the scalarized variable* before
a store to the unscalarized version, ever.
But this won't get you less uninitialized variables in general
generated from SRA, just as an FYI.
Bernd Schmidt
2007-04-28 17:46:24 UTC
Permalink
Daniel Berlin wrote:
> It would also be a mistake to try to do significant optimization of
> copy in/out placement directly in SRA. Again, that is why we have PRE
> and DSE. If they are not moving/eliminating these things, they need
> to be fixed.

I've had a look at why these statements are not eliminated. I figure
that dce is the pass that should get rid of them, but it doesn't. From
the dumps:

# SFT.70_14 = VDEF <SFT.70_13(D)>
y.k = y$k_9(D);
# VUSE <sD_15(D)>
# SFT.70_19 = VDEF <SFT.70_14>
# SFT.71_20 = VDEF <SFT.71_16(D)>
# SFT.72_21 = VDEF <SFT.72_17(D)>
# SFT.73_22 = VDEF <SFT.73_18(D)>
y = sD;

Note how all the VDEFs have an input argument, which causes us to
believe that the second statement depends on the first. AFAICT, these
VDEFs should have no arguments, but the compiler always generates them
with exactly one input. I've tried to fix that, but so far the patch I
have is only useful as an SSA learning tool for myself, as it creates
lots of failures elsewhere in the compiler...

Any other ideas how to address this?


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Richard Guenther
2007-04-28 20:48:02 UTC
Permalink
On 4/28/07, Bernd Schmidt <***@t-online.de> wrote:
> Daniel Berlin wrote:
> > It would also be a mistake to try to do significant optimization of
> > copy in/out placement directly in SRA. Again, that is why we have PRE
> > and DSE. If they are not moving/eliminating these things, they need
> > to be fixed.
>
> I've had a look at why these statements are not eliminated. I figure
> that dce is the pass that should get rid of them, but it doesn't. From
> the dumps:
>
> # SFT.70_14 = VDEF <SFT.70_13(D)>
> y.k = y$k_9(D);
> # VUSE <sD_15(D)>
> # SFT.70_19 = VDEF <SFT.70_14>
> # SFT.71_20 = VDEF <SFT.71_16(D)>
> # SFT.72_21 = VDEF <SFT.72_17(D)>
> # SFT.73_22 = VDEF <SFT.73_18(D)>
> y = sD;
>
> Note how all the VDEFs have an input argument, which causes us to
> believe that the second statement depends on the first. AFAICT, these
> VDEFs should have no arguments, but the compiler always generates them
> with exactly one input. I've tried to fix that, but so far the patch I
> have is only useful as an SSA learning tool for myself, as it creates
> lots of failures elsewhere in the compiler...
>
> Any other ideas how to address this?

These are all the default SSA_NAMEs of the SFTs (note the (D)). If
y is a local variable DCE should "ignore" them (those uses are undefined
in that case). So the only relevant VDEF for DCE/DSE is

# SFT.70_19 = VDEF <SFT.70_14>

So it's a DCE problem, not a problem of the SSA form.

Richard.
>
> Bernd
> --
> This footer brought to you by insane German lawmakers.
> Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
> Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
> Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
>
Daniel Berlin
2007-04-28 21:24:36 UTC
Permalink
On 4/28/07, Richard Guenther <***@gmail.com> wrote:
> On 4/28/07, Bernd Schmidt <***@t-online.de> wrote:
> > Daniel Berlin wrote:
> > > It would also be a mistake to try to do significant optimization of
> > > copy in/out placement directly in SRA. Again, that is why we have PRE
> > > and DSE. If they are not moving/eliminating these things, they need
> > > to be fixed.
> >
> > I've had a look at why these statements are not eliminated. I figure
> > that dce is the pass that should get rid of them, but it doesn't. From
> > the dumps:
> >
> > # SFT.70_14 = VDEF <SFT.70_13(D)>
> > y.k = y$k_9(D);
> > # VUSE <sD_15(D)>
> > # SFT.70_19 = VDEF <SFT.70_14>
> > # SFT.71_20 = VDEF <SFT.71_16(D)>
> > # SFT.72_21 = VDEF <SFT.72_17(D)>
> > # SFT.73_22 = VDEF <SFT.73_18(D)>
> > y = sD;
> >
> > Note how all the VDEFs have an input argument, which causes us to
> > believe that the second statement depends on the first. AFAICT, these
> > VDEFs should have no arguments, but the compiler always generates them
> > with exactly one input. I've tried to fix that, but so far the patch I
> > have is only useful as an SSA learning tool for myself, as it creates
> > lots of failures elsewhere in the compiler...
> >
> > Any other ideas how to address this?
>
> These are all the default SSA_NAMEs of the SFTs (note the (D)). If
> y is a local variable DCE should "ignore" them (those uses are undefined
> in that case). So the only relevant VDEF for DCE/DSE is
>
> # SFT.70_19 = VDEF <SFT.70_14>
>
Right.

I would imagine *DSE* should eliminate this, not DCE, because it's
store over store.
However, this does show a problem with our SSA form, actually.
Our VDEF's treat the RHS argument of the VDEF as an implicit use (and
will avoid adding explicit vuses of those arguments), besides just the
next part of the def-def chain to follow. However, in cases like the
above, there *is no use of SFT.70_14* in that statement, only a def.
We don't have a way to specify that, so DCE will mark the statement
defining SFT.70_14 as necessary.

DSE should do better, since it was taught how to deal with this
problem by seeing if the store is to the same place, or subsumed by
the later store with no intervening uses.
Richard Guenther
2007-04-29 09:38:02 UTC
Permalink
On 4/28/07, Daniel Berlin <***@dberlin.org> wrote:
> On 4/28/07, Richard Guenther <***@gmail.com> wrote:
> > On 4/28/07, Bernd Schmidt <***@t-online.de> wrote:
> > > Daniel Berlin wrote:
> > > > It would also be a mistake to try to do significant optimization of
> > > > copy in/out placement directly in SRA. Again, that is why we have PRE
> > > > and DSE. If they are not moving/eliminating these things, they need
> > > > to be fixed.
> > >
> > > I've had a look at why these statements are not eliminated. I figure
> > > that dce is the pass that should get rid of them, but it doesn't. From
> > > the dumps:
> > >
> > > # SFT.70_14 = VDEF <SFT.70_13(D)>
> > > y.k = y$k_9(D);
> > > # VUSE <sD_15(D)>
> > > # SFT.70_19 = VDEF <SFT.70_14>
> > > # SFT.71_20 = VDEF <SFT.71_16(D)>
> > > # SFT.72_21 = VDEF <SFT.72_17(D)>
> > > # SFT.73_22 = VDEF <SFT.73_18(D)>
> > > y = sD;
> > >
> > > Note how all the VDEFs have an input argument, which causes us to
> > > believe that the second statement depends on the first. AFAICT, these
> > > VDEFs should have no arguments, but the compiler always generates them
> > > with exactly one input. I've tried to fix that, but so far the patch I
> > > have is only useful as an SSA learning tool for myself, as it creates
> > > lots of failures elsewhere in the compiler...
> > >
> > > Any other ideas how to address this?
> >
> > These are all the default SSA_NAMEs of the SFTs (note the (D)). If
> > y is a local variable DCE should "ignore" them (those uses are undefined
> > in that case). So the only relevant VDEF for DCE/DSE is
> >
> > # SFT.70_19 = VDEF <SFT.70_14>
> >
> Right.
>
> I would imagine *DSE* should eliminate this, not DCE, because it's
> store over store.
> However, this does show a problem with our SSA form, actually.
> Our VDEF's treat the RHS argument of the VDEF as an implicit use (and
> will avoid adding explicit vuses of those arguments), besides just the
> next part of the def-def chain to follow. However, in cases like the
> above, there *is no use of SFT.70_14* in that statement, only a def.
> We don't have a way to specify that, so DCE will mark the statement
> defining SFT.70_14 as necessary.
>
> DSE should do better, since it was taught how to deal with this
> problem by seeing if the store is to the same place, or subsumed by
> the later store with no intervening uses.

Now the problem with DSE seems to be that for

(gdb) call debug_generic_expr (stmt)
# SFT.84_24 = VDEF <SFT.84_10(D)> { SFT.84 }
y.k = y$k_23(D)
(gdb) call debug_generic_expr (use_stmt)
# VUSE <sD_9(D)> { sD }
# SFT.84_14 = VDEF <SFT.84_24>
# SFT.85_15 = VDEF <SFT.85_11(D)>
# SFT.86_16 = VDEF <SFT.86_12(D)>
# SFT.87_17 = VDEF <SFT.87_13(D)> { SFT.84 SFT.85 SFT.86 SFT.87 }
y = sD

at tree-ssa-dse.c:604 (the call to dse_partial_kill_p):

/* If we have precisely one immediate use at this point, then we may
have found redundant store. Make sure that the stores are to
the same memory location. This includes checking that any
SSA-form variables in the address will have the same values. */
if (use_p != NULL_USE_OPERAND_P
&& bitmap_bit_p (dse_gd->stores, get_stmt_uid (use_stmt))
&& (operand_equal_p (GIMPLE_STMT_OPERAND (stmt, 0),
GIMPLE_STMT_OPERAND (use_stmt, 0), 0)
|| dse_partial_kill_p (stmt, dse_gd))
&& memory_address_same (stmt, use_stmt))

dse_partial_kill_p does not do what its same suggests, but first asserts
that stmt stores to all of the aggregate (which it of course does not).
It _looks_ like that maybe dse_partial_kill_p (use_stmt, dse_gd) was
intended here. Otherwise DSE detects stmt as possible dead store.

Richard.
Richard Guenther
2007-04-29 09:47:10 UTC
Permalink
On 4/29/07, Richard Guenther <***@gmail.com> wrote:
> On 4/28/07, Daniel Berlin <***@dberlin.org> wrote:
> > On 4/28/07, Richard Guenther <***@gmail.com> wrote:
> > > On 4/28/07, Bernd Schmidt <***@t-online.de> wrote:
> > > > Daniel Berlin wrote:
> > > > > It would also be a mistake to try to do significant optimization of
> > > > > copy in/out placement directly in SRA. Again, that is why we have PRE
> > > > > and DSE. If they are not moving/eliminating these things, they need
> > > > > to be fixed.
> > > >
> > > > I've had a look at why these statements are not eliminated. I figure
> > > > that dce is the pass that should get rid of them, but it doesn't. From
> > > > the dumps:
> > > >
> > > > # SFT.70_14 = VDEF <SFT.70_13(D)>
> > > > y.k = y$k_9(D);
> > > > # VUSE <sD_15(D)>
> > > > # SFT.70_19 = VDEF <SFT.70_14>
> > > > # SFT.71_20 = VDEF <SFT.71_16(D)>
> > > > # SFT.72_21 = VDEF <SFT.72_17(D)>
> > > > # SFT.73_22 = VDEF <SFT.73_18(D)>
> > > > y = sD;
> > > >
> > > > Note how all the VDEFs have an input argument, which causes us to
> > > > believe that the second statement depends on the first. AFAICT, these
> > > > VDEFs should have no arguments, but the compiler always generates them
> > > > with exactly one input. I've tried to fix that, but so far the patch I
> > > > have is only useful as an SSA learning tool for myself, as it creates
> > > > lots of failures elsewhere in the compiler...
> > > >
> > > > Any other ideas how to address this?
> > >
> > > These are all the default SSA_NAMEs of the SFTs (note the (D)). If
> > > y is a local variable DCE should "ignore" them (those uses are undefined
> > > in that case). So the only relevant VDEF for DCE/DSE is
> > >
> > > # SFT.70_19 = VDEF <SFT.70_14>
> > >
> > Right.
> >
> > I would imagine *DSE* should eliminate this, not DCE, because it's
> > store over store.
> > However, this does show a problem with our SSA form, actually.
> > Our VDEF's treat the RHS argument of the VDEF as an implicit use (and
> > will avoid adding explicit vuses of those arguments), besides just the
> > next part of the def-def chain to follow. However, in cases like the
> > above, there *is no use of SFT.70_14* in that statement, only a def.
> > We don't have a way to specify that, so DCE will mark the statement
> > defining SFT.70_14 as necessary.
> >
> > DSE should do better, since it was taught how to deal with this
> > problem by seeing if the store is to the same place, or subsumed by
> > the later store with no intervening uses.
>
> Now the problem with DSE seems to be that for
>
> (gdb) call debug_generic_expr (stmt)
> # SFT.84_24 = VDEF <SFT.84_10(D)> { SFT.84 }
> y.k = y$k_23(D)
> (gdb) call debug_generic_expr (use_stmt)
> # VUSE <sD_9(D)> { sD }
> # SFT.84_14 = VDEF <SFT.84_24>
> # SFT.85_15 = VDEF <SFT.85_11(D)>
> # SFT.86_16 = VDEF <SFT.86_12(D)>
> # SFT.87_17 = VDEF <SFT.87_13(D)> { SFT.84 SFT.85 SFT.86 SFT.87 }
> y = sD
>
> at tree-ssa-dse.c:604 (the call to dse_partial_kill_p):
>
> /* If we have precisely one immediate use at this point, then we may
> have found redundant store. Make sure that the stores are to
> the same memory location. This includes checking that any
> SSA-form variables in the address will have the same values. */
> if (use_p != NULL_USE_OPERAND_P
> && bitmap_bit_p (dse_gd->stores, get_stmt_uid (use_stmt))
> && (operand_equal_p (GIMPLE_STMT_OPERAND (stmt, 0),
> GIMPLE_STMT_OPERAND (use_stmt, 0), 0)
> || dse_partial_kill_p (stmt, dse_gd))
> && memory_address_same (stmt, use_stmt))
>
> dse_partial_kill_p does not do what its same suggests, but first asserts
> that stmt stores to all of the aggregate (which it of course does not).
> It _looks_ like that maybe dse_partial_kill_p (use_stmt, dse_gd) was
> intended here. Otherwise DSE detects stmt as possible dead store.

Oh, and that doesn't work because we don't visit

# SFT.84D.1820_19 = VDEF <SFT.84D.1820_28>
# SFT.85D.1821_20 = VDEF <SFT.85D.1821_15>
# SFT.86D.1822_21 = VDEF <SFT.86D.1822_16>
# SFT.87D.1823_22 = VDEF <SFT.87D.1823_17>
yD.1648 = retmeD (yD.1648) [return slot optimization];

and thus don't record y as dse_whole_aggregate_clobbered_p.

Richard.
Richard Guenther
2007-04-29 10:00:00 UTC
Permalink
On 4/29/07, Richard Guenther <***@gmail.com> wrote:
> On 4/29/07, Richard Guenther <***@gmail.com> wrote:
> > Now the problem with DSE seems to be that for
> >
> > (gdb) call debug_generic_expr (stmt)
> > # SFT.84_24 = VDEF <SFT.84_10(D)> { SFT.84 }
> > y.k = y$k_23(D)
> > (gdb) call debug_generic_expr (use_stmt)
> > # VUSE <sD_9(D)> { sD }
> > # SFT.84_14 = VDEF <SFT.84_24>
> > # SFT.85_15 = VDEF <SFT.85_11(D)>
> > # SFT.86_16 = VDEF <SFT.86_12(D)>
> > # SFT.87_17 = VDEF <SFT.87_13(D)> { SFT.84 SFT.85 SFT.86 SFT.87 }
> > y = sD
> >
> > at tree-ssa-dse.c:604 (the call to dse_partial_kill_p):
> >
> > /* If we have precisely one immediate use at this point, then we may
> > have found redundant store. Make sure that the stores are to
> > the same memory location. This includes checking that any
> > SSA-form variables in the address will have the same values. */
> > if (use_p != NULL_USE_OPERAND_P
> > && bitmap_bit_p (dse_gd->stores, get_stmt_uid (use_stmt))
> > && (operand_equal_p (GIMPLE_STMT_OPERAND (stmt, 0),
> > GIMPLE_STMT_OPERAND (use_stmt, 0), 0)
> > || dse_partial_kill_p (stmt, dse_gd))
> > && memory_address_same (stmt, use_stmt))
> >
> > dse_partial_kill_p does not do what its same suggests, but first asserts
> > that stmt stores to all of the aggregate (which it of course does not).
> > It _looks_ like that maybe dse_partial_kill_p (use_stmt, dse_gd) was
> > intended here. Otherwise DSE detects stmt as possible dead store.
>
> Oh, and that doesn't work because we don't visit
>
> # SFT.84D.1820_19 = VDEF <SFT.84D.1820_28>
> # SFT.85D.1821_20 = VDEF <SFT.85D.1821_15>
> # SFT.86D.1822_21 = VDEF <SFT.86D.1822_16>
> # SFT.87D.1823_22 = VDEF <SFT.87D.1823_17>
> yD.1648 = retmeD (yD.1648) [return slot optimization];
>
> and thus don't record y as dse_whole_aggregate_clobbered_p.

We don't record whole-aggregate sets in that bitmap anyway.

Richard.
Bernd Schmidt
2007-04-20 12:56:59 UTC
Permalink
Bernd Schmidt wrote:
> execute/20040709-2.c is currently failing on the Blackfin at -O3. This
> is a problem in reload: these days register allocation can assign the
> same hard register to two pseudos even if their lifetimes overlap if one
> of them is uninitialized. If you then have an insn where the
> uninitialized pseudo dies, and reload needs an output reload, it can
> choose the dying register for it. Doing so, it clobbers the other live
> pseudo that shares the hardregister with the uninitialized register.
>
> We've fixed this bug twice before, once in find_dummy_reload, and in
> push_reload. This adds the same check to combine_reloads, where we can
> hit the same problem.

I've bootstrapped & tested this on the 4.2 branch (i686-linux) and
committed there too, as 123995. Probably won't get around to doing 4.1
for a week or so.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, Vincent Roche, Joseph E. McDonough
Eric Botcazou
2007-04-20 22:01:29 UTC
Permalink
> I've bootstrapped & tested this on the 4.2 branch (i686-linux) and
> committed there too, as 123995. Probably won't get around to doing 4.1
> for a week or so.

Is the problem a regression on the 4.1 branch too? If not, please refrain
from touching reload at this point.

--
Eric Botcazou
Bernd Schmidt
2007-04-28 15:45:45 UTC
Permalink
Eric Botcazou wrote:
>> I've bootstrapped & tested this on the 4.2 branch (i686-linux) and
>> committed there too, as 123995. Probably won't get around to doing 4.1
>> for a week or so.
>
> Is the problem a regression on the 4.1 branch too? If not, please refrain
> from touching reload at this point.

4.1 has Vlad's accurate liveness code, and the two other bug fixes I
mentioned, so there's no reason to believe the problem does not affect
the branch. The patch is conservative, and not intrusive. I've
bootstrapped and regression tested it on i686-linux. Ok to commit?


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Eric Botcazou
2007-04-28 16:02:38 UTC
Permalink
> 4.1 has Vlad's accurate liveness code, and the two other bug fixes I
> mentioned, so there's no reason to believe the problem does not affect
> the branch. The patch is conservative, and not intrusive. I've
> bootstrapped and regression tested it on i686-linux. Ok to commit?

Well, you have more authority than anyone else over this, so it's up to you to
answer your own question, I'm afraid. :-)

--
Eric Botcazou
Bernd Schmidt
2007-04-29 00:00:30 UTC
Permalink
Bernd Schmidt wrote:
> Bernd Schmidt wrote:
>> execute/20040709-2.c is currently failing on the Blackfin at -O3.
>> This is a problem in reload: these days register allocation can assign
>> the same hard register to two pseudos even if their lifetimes overlap
>> if one of them is uninitialized. If you then have an insn where the
>> uninitialized pseudo dies, and reload needs an output reload, it can
>> choose the dying register for it. Doing so, it clobbers the other
>> live pseudo that shares the hardregister with the uninitialized register.
>>
>> We've fixed this bug twice before, once in find_dummy_reload, and in
>> push_reload. This adds the same check to combine_reloads, where we
>> can hit the same problem.
>
> I've bootstrapped & tested this on the 4.2 branch (i686-linux) and
> committed there too, as 123995. Probably won't get around to doing 4.1
> for a week or so.

Now committed on the 4.1 branch as 124268 after the same tests.


Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif
Loading...