Discussion:
fix for strct-pack-1.c regressions
Aldy Hernandez
2002-03-13 02:01:51 UTC
Permalink
strct-pack-1.c was failing on ppc -mlittle, and apparently on other
places too according to David Anglin:

http://gcc.gnu.org/ml/gcc-patches/2001-12/msg01035.html

the scheduler was moving a memory load before the instruction that
dominated it, because the alias information did not take into account that
structure fields may be different, but may still overlap.

rth suggested this and ok'ed it everwhere. committing to branch and
trunk.

2002-03-13 Aldy Hernandez <***@redhat.com>

* expmed.c (store_bit_field): Reset alias set for memory.
(extract_bit_field): Same.


Index: expmed.c
===================================================================
RCS file: /cvs/uberbaum/gcc/expmed.c,v
retrieving revision 1.111
diff -c -p -r1.111 expmed.c
*** expmed.c 2002/03/05 11:10:36 1.111
--- expmed.c 2002/03/13 01:54:28
*************** store_bit_field (str_rtx, bitsize, bitnu
*** 392,397 ****
--- 392,406 ----
}
}

+ /* We may be accessing data outside the field, which means
+ we can alias adjacent data. */
+ if (GET_CODE (op0) == MEM)
+ {
+ op0 = shallow_copy_rtx (op0);
+ set_mem_alias_set (op0, 0);
+ set_mem_expr (op0, 0);
+ }
+
/* If OP0 is a register, BITPOS must count within a word.
But as we have it, it counts within whatever size OP0 now has.
On a bigendian machine, these are not the same, so convert. */
*************** extract_bit_field (str_rtx, bitsize, bit
*** 1068,1073 ****
--- 1077,1091 ----
abort ();
}
}
+
+ /* We may be accessing data outside the field, which means
+ we can alias adjacent data. */
+ if (GET_CODE (op0) == MEM)
+ {
+ op0 = shallow_copy_rtx (op0);
+ set_mem_alias_set (op0, 0);
+ set_mem_expr (op0, 0);
+ }

/* ??? We currently assume TARGET is at least as big as BITSIZE.
If that's wrong, the solution is to test for it and set TARGET to 0
Richard Kenner
2002-03-13 03:49:52 UTC
Permalink
the scheduler was moving a memory load before the instruction that
dominated it, because the alias information did not take into account that
structure fields may be different, but may still overlap.

rth suggested this and ok'ed it everwhere. committing to branch and
trunk.

This looks wrong to me. Shouldn't the alias set be that of the record
type, not set 0?

Indeed, thinking about it more, why is the alias set being changed from
that of the type? We should only set the alias set to that of a field if
the field is addresable and bitfield are not.

So I think something else is wrong here.
Richard Henderson
2002-03-13 05:56:18 UTC
Permalink
Post by Richard Kenner
This looks wrong to me. Shouldn't the alias set be that of the record
type, not set 0?
No. We may access memory outside the structure entirely.

Agreed that a more comprehensive solution is to examine exactly
the kinds of memory references we're going to produce and determine
whether or not they fall outside the field and/or structure. Such
changes are not appropriate for 3.1; I suggested to Aldy that he
might want to look into this for mainline, but he's under a bit o
deadline pressure at the moment.
Post by Richard Kenner
Indeed, thinking about it more, why is the alias set being changed from
that of the type?
Please examine the test case.


r~
John David Anglin
2002-03-13 05:04:51 UTC
Permalink
Post by Aldy Hernandez
* expmed.c (store_bit_field): Reset alias set for memory.
(extract_bit_field): Same.
This didn't fix the fails on hppa2.0w-hp-hpux11.11.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Richard Kenner
2002-03-13 12:45:26 UTC
Permalink
No. We may access memory outside the structure entirely.

Are you sure? I didn't think we ever did that (well except for the stuff
in alpha.c).

Please examine the test case.

This is the one in gcc.c-torture/execute?
Richard Henderson
2002-03-13 19:16:01 UTC
Permalink
Post by Richard Henderson
No. We may access memory outside the structure entirely.
Are you sure? I didn't think we ever did that (well except for the stuff
in alpha.c).
I think so. You'd have to have nested structures to see it,
since there has to be alignment present to take advantage.
Perhaps something like

struct foo {
long x;
short y;
struct bar { int a; } __attribute__((packed)) z;
} f;

f.z.a = 0;

In that case I don't see anything preventing us from using
the alignment of F to work around the misalignment of A. So
we couldn't simply strip off one layer of component_ref and
have the right alias set.
Post by Richard Henderson
This is the one in gcc.c-torture/execute?
Yes. IIRC powerpc-eabi -mlittle showed the problem; Aldy
can confirm the proper configuration.


r~
Aldy Hernandez
2002-03-14 01:01:58 UTC
Permalink
Post by Richard Henderson
Post by Richard Kenner
This is the one in gcc.c-torture/execute?
Yes, strct-pack-1.c

typedef struct
{
short s __attribute__ ((aligned(2), packed));
double d __attribute__ ((aligned(2), packed));
} TRIAL;

s and d end up sharing different bits of the same memory and
having to do ANDs/ORs to get the values in correctly.

The scheduler moves things around and ends up putting the store before
a load of the same memory:

z <= load from [x]
twiddle z's bits
z => store into [x]

the store is moved up before the load because we calculate wrong
dependence information.
Post by Richard Henderson
Yes. IIRC powerpc-eabi -mlittle showed the problem; Aldy
can confirm the proper configuration.
powerpc-eabialtivec
-mlittle -mstrict-align -O[2s]

cheers
aldy
John David Anglin
2002-03-13 16:12:02 UTC
Permalink
Post by Aldy Hernandez
* expmed.c (store_bit_field): Reset alias set for memory.
(extract_bit_field): Same.
This the PA code that sets the struct TRIAL:

ldi 1,%r20
ldw -112(%r30),%r21 <=== 1 (reads 4 bytes)
ldil L'16384,%r19
ldh -102(%r30),%r22
ldo 48(%r19),%r19
depi 0,31,16,%r21
ldo -112(%r30),%r26
or %r21,%r19,%r21
sth %r20,-112(%r30) <=== 2 (writes 2 bytes)
stw %r22,-104(%r30)
stw %r21,-112(%r30) <=== 3 (writes 4 bytes)
.CALL ARGW0=GR
bl check,%r2
stw %r0,-108(%r30)

2 should come before 1. However, since the order is wrong, 3 clobbers 2.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-03-13 16:34:22 UTC
Permalink
Post by John David Anglin
Post by Aldy Hernandez
* expmed.c (store_bit_field): Reset alias set for memory.
(extract_bit_field): Same.
This didn't fix the fails on hppa2.0w-hp-hpux11.11.
I'm wrong. Somehow cvs seems to have downloaded an inconsistent
source tree at my end. The patch entry was in the ChangeLog but
expmed.c was not patched. Looking at this test on a hppa-linux
build that started a little later, the test now passes at -O2 and -Os.

Sorry for the confusion,
Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Richard Kenner
2002-03-13 19:20:12 UTC
Permalink
You'd have to have nested structures to see it,

That makes sense.

In that case I don't see anything preventing us from using
the alignment of F to work around the misalignment of A. So
we couldn't simply strip off one layer of component_ref and
have the right alias set.

Right, but you could strip off *two*. However, I don't think we
should be trying to use the alias set of a mis-aligned field, so I'm
not sure I understand the original problem.
Post by Richard Kenner
This is the one in gcc.c-torture/execute?
Yes.

That doesn't seem to have nested structures, so I'm confused.
Richard Henderson
2002-03-13 19:39:43 UTC
Permalink
Post by Richard Kenner
Right, but you could strip off *two*.
Yes, but that (as with adjusting mem_attr.expr) going to take a
reasonable-sized hunk o logic to coordinate with what
store/extract_bit_field is actually going to do.
Post by Richard Kenner
However, I don't think we should be trying to use the alias set
of a mis-aligned field, so I'm not sure I understand the original problem.
Digging up Aldy's first message on this...

The problem actually observed is

(insn 12 11 15 (set (mem/s:HI (plus:SI (reg/f:SI 31 r31)
(const_int 16 [0x10])) [6 trial.s+0 S2 A128])
(reg:HI 116)) 296 {*rs6000.md:8555} (insn_list 11 (nil))
(expr_list:REG_DEAD (reg:HI 116)
(nil)))

(insn 16 15 18 (set (reg:SI 118)
(zero_extend:SI (mem/s:HI (plus:SI (reg/f:SI 31 r31)
(const_int 16 [0x10])) [4 trial.d+-2 S2 A128])))
30 {*rs6000.md:2318} (nil)
(nil))

being considered not to alias, and thus mis-scheduled.

I consider [trial.d-2] to be incorrect. You also find that the
alias set is set as for short and int, which also prevents these
from being considered to alias. Thus my suggestion to (in the
short term) smash both of these values to known-safe settings.


r~

BTW, it's "-mstrict-align -mlittle -O2" on powerpc-eabialtivec
that is supposed to show this failure.
Richard Kenner
2002-03-14 03:44:25 UTC
Permalink
s and d end up sharing different bits of the same memory and
having to do ANDs/ORs to get the values in correctly.

I believe that DECL_NONADDRESSABLE_P should be set in such cases and I think
if it is that it solves this problem without any other change.
r***@redhat.com
2002-03-14 23:45:03 UTC
Permalink
Post by Richard Kenner
I believe that DECL_NONADDRESSABLE_P should be set in such cases and I think
if it is that it solves this problem without any other change.
Technically, you can still take the address of these fields.

We *should* be generating a pointer-to-unaligned-int rather
than pointer-to-int, but we can't represent that properly
at the moment.


r~
Richard Kenner
2002-03-15 01:37:33 UTC
Permalink
Technically, you can still take the address of these fields.

Perhaps, though I don't think we've defined the __attribute__
extension well enough to be sure you can. Certainly a dereference to
it wouldn't work.
John David Anglin
2002-03-28 23:44:31 UTC
Permalink
The PA linux port suffers from this problem as well, and I presume that the
The following has been applied to the trunk and 3.1 branch. It fixes
the same problem encountered on the sparc. Tested with no regressions
on hppa-unknown-linux-gnu.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-03-28 John David Anglin <***@hiauly1.hia.nrc.ca>

* pa-linux.h (LOCAL_LABEL_PREFIX): Define.

--- pa-linux.h.orig Wed Mar 13 12:02:16 2002
+++ pa-linux.h Thu Mar 28 11:12:52 2002
@@ -148,6 +148,10 @@
} \
while (0)

+/* We want local labels to start with period if made with asm_fprintf. */
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX "."
+
/* Define these to generate the Linux/ELF/SysV style of internal
labels all the time - i.e. to be compatible with
ASM_GENERATE_INTERNAL_LABEL in <elfos.h>. Compare these with the
Richard Henderson
2002-03-29 00:18:15 UTC
Permalink
I was wondering why we don't just add
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX "."
to elfos.h. The default assumption in elfos.h is that local labels
start with '.'.
For mainline, I'd support doing that if we also move
the ASM_GENERATE_INTERNAL_LABEL and ASM_OUTPUT_INTERNAL_LABEL
macros as well.


r~
John David Anglin
2002-03-29 01:03:29 UTC
Permalink
Post by Richard Henderson
to elfos.h. The default assumption in elfos.h is that local labels
start with '.'.
For mainline, I'd support doing that if we also move
the ASM_GENERATE_INTERNAL_LABEL and ASM_OUTPUT_INTERNAL_LABEL
macros as well.
They are already in elfos.h so they don't need moving. That's why
I made the suggestion.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Richard Henderson
2002-03-29 01:05:28 UTC
Permalink
Post by John David Anglin
They are already in elfos.h so they don't need moving. That's why
I made the suggestion.
Ah. So: yes, basically any place what has ASM*INTERNAL* and
doesn't define LOCAL_LABEL_PREFIX is probably wrong. The
exceptions being those few a.out systems that really don't
use a prefix.


r~
John David Anglin
2002-05-08 20:50:05 UTC
Permalink
I think that using RTL_FLAG_CHECKn is wrong. It's not an error
if the rtx is not a JUMP_INSN or a CALL_INSN. However, the result
In looking at dbr_schedule, I see that INSN_ANNULLED_BRANCH_P and
INSN_FROM_TARGET_P apply to any active insn.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-06-03 21:02:36 UTC
Permalink
By trial and error, I have determined that the following bootstrap failure
* rtl.h (CC0_P): New.
* gcse.c (cprop_jump): Use it with single_set. Tweak dump text.
(cprop_insn): Allow any mode register; use CC0_P. CSE out single_set.
(bypass_block): Save old dest block for dump text.
(bypass_conditional_jumps): Allow any mode register; use CC0_P.
Allow only true SET insns, not single_set.
./xgcc -B./ -B/home/dave/opt/gnu/hppa-linux/bin/ -isystem /home/dave/opt/gnu/hppa-linux/include -isystem /home/dave/opt/gnu/hppa-linux/sys-include -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -isystem ./include -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/config -I../../gcc/gcc/../include -g0 -finhibit-size-directive -fno-inline-functions -fno-exceptions -fno-zero-initialized-in-bss \
-c ../../gcc/gcc/crtstuff.c -DCRT_BEGIN \
-o crtbegin.o
../../gcc/gcc/crtstuff.c:282: internal error: Segmentation fault
I see that this is fixed (just a typo).

What actually started the hunt is the following ICE which is still present:

./xgcc -B./ -B/home/dave/opt/gnu/hppa-linux/bin/ -isystem /home/dave/opt/gnu/hppa-linux/include -isystem /home/dave/opt/gnu/hppa-linux/sys-include -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -isystem ./include -fPIC -DELF=1 -DLINUX=1 -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/config -I../../gcc/gcc/../include -DL_muldi3 -c ../../gcc/gcc/libgcc2.c -o libgcc/./_muldi3.o
../../gcc/gcc/libgcc2.c: In function `__muldi3':
../../gcc/gcc/libgcc2.c:367: virtual array insn_addresses[211]: element 229 out of bounds in pa_output_function_prologue, at config/pa/pa.c:3185

The above error occurs on the following line:

total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));

INSN_UID (get_last_insn ()) is 229. Something has messed up the size of
the array insn_addresses. This could be the above mentioned patch or some
other subsequent patch.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
l***@redhat.com
2002-06-04 13:34:25 UTC
Permalink
By trial and error, I have determined that the following bootstrap failure
* rtl.h (CC0_P): New.
* gcse.c (cprop_jump): Use it with single_set. Tweak dump text.
(cprop_insn): Allow any mode register; use CC0_P. CSE out single_set.
(bypass_block): Save old dest block for dump text.
(bypass_conditional_jumps): Allow any mode register; use CC0_P.
Allow only true SET insns, not single_set.
./xgcc -B./ -B/home/dave/opt/gnu/hppa-linux/bin/ -isystem /home/dave/opt/g
nu/hppa-linux/include -isystem /home/dave/opt/gnu/hppa-linux/sys-include -O2
-DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
-isystem ./include -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gc
c/config -I../../gcc/gcc/../include -g0 -finhibit-size-directive -fno-inlin
e-functions -fno-exceptions -fno-zero-initialized-in-bss \
-c ../../gcc/gcc/crtstuff.c -DCRT_BEGIN \
-o crtbegin.o
../../gcc/gcc/crtstuff.c:282: internal error: Segmentation fault
I see that this is fixed (just a typo).
./xgcc -B./ -B/home/dave/opt/gnu/hppa-linux/bin/ -isystem /home/dave/opt/gnu
/hppa-linux/include -isystem /home/dave/opt/gnu/hppa-linux/sys-include -O2
-DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -
isystem ./include -fPIC -DELF=1 -DLINUX=1 -g -DHAVE_GTHR_DEFAULT -DIN_LIBGC
C2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../
../gcc/gcc/config -I../../gcc/gcc/../include -DL_muldi3 -c ../../gcc/gcc/li
bgcc2.c -o libgcc/./_muldi3.o
../../gcc/gcc/libgcc2.c:367: virtual array insn_addresses[211]: element 229
out of bounds in pa_output_function_prologue, at config/pa/pa.c:3185
total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
INSN_UID (get_last_insn ()) is 229. Something has messed up the size of
the array insn_addresses. This could be the above mentioned patch or some
other subsequent patch.
I think Jan mentioned that this was his bug and that he was working on it.

jeff
Jan Hubicka
2002-06-04 13:45:15 UTC
Permalink
Post by l***@redhat.com
Post by John David Anglin
total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
INSN_UID (get_last_insn ()) is 229. Something has messed up the size of
the array insn_addresses. This could be the above mentioned patch or some
other subsequent patch.
I think Jan mentioned that this was his bug and that he was working on it.
I was asking for opinion for proper fix on the mailing list.
The problem is that now I delete NOTE_INSN_BLOCK_* notes during the
optimization and re-emit them in final.c when debug output is done.
This makes those notes to not have computed INSN_ADDRESS, that is
de-facto OK, given that note is not an instruction so it does not have
any address.

The code like this can be fixed easilly by finding first/last nonnote
instruction in the chain.

I am not sure whether we want to fix all incarnations in the port
dependent code or find way to iniitialize INSN_ADDRESS that can be done
eighter by moving the re-emit code before shorten_branches that has the
dwawback that existence/nonexistence of notes may result in different
code output on -g/non-g compilation or simply add the gaps into
INSN_ADDRESS array, that is also not so fortunate due to resizing
issues.

What do you think?
Honza
Post by l***@redhat.com
jeff
John David Anglin
2002-06-04 15:55:41 UTC
Permalink
Post by Jan Hubicka
Post by John David Anglin
total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
INSN_UID (get_last_insn ()) is 229. Something has messed up the size of
the array insn_addresses. This could be the above mentioned patch or some
The code like this can be fixed easilly by finding first/last nonnote
instruction in the chain.
I am testing the enclosed patch per your suggestion.
Post by Jan Hubicka
I am not sure whether we want to fix all incarnations in the port
dependent code or find way to iniitialize INSN_ADDRESS that can be done
eighter by moving the re-emit code before shorten_branches that has the
dwawback that existence/nonexistence of notes may result in different
code output on -g/non-g compilation or simply add the gaps into
INSN_ADDRESS array, that is also not so fortunate due to resizing
issues.
There only appears to be a couple of other ports with similar code to
determine the size of a function. Most other uses of INSN_ADDRESSES
are probably ok. Possibly, the function "get_last_nonnote_insn" could
be put in emit-rtl.c. Then, it wouldn't be too onerous to change all
the ports.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-06-04 John David Anglin <***@hiauly1.hia.nrc.ca>

* pa.c (get_last_nonnote_insn): New function.
(pa_output_function_prologue): Use it.

Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.167
diff -u -3 -p -r1.167 pa.c
--- config/pa/pa.c 31 May 2002 18:01:13 -0000 1.167
+++ config/pa/pa.c 4 Jun 2002 15:35:15 -0000
@@ -95,6 +95,7 @@ hppa_fpstore_bypass_p (out_insn, in_insn
#endif
#endif

+static rtx get_last_nonnote_insn PARAMS((void));
static inline rtx force_mode PARAMS ((enum machine_mode, rtx));
static void pa_combine_instructions PARAMS ((rtx));
static int pa_can_combine_p PARAMS ((rtx, rtx, rtx, int, rtx, rtx, rtx));
@@ -3116,6 +3117,24 @@ compute_frame_size (size, fregs_live)
return (fsize + STACK_BOUNDARY - 1) & ~(STACK_BOUNDARY - 1);
}

+/* Return the last nonnote insn emitted in current sequence or current
+ function. This routine looks inside SEQUENCEs. */
+
+static rtx
+get_last_nonnote_insn ()
+{
+ rtx insn = get_last_insn ();
+
+ while (insn)
+ {
+ insn = previous_insn (insn);
+ if (insn == 0 || GET_CODE (insn) != NOTE)
+ break;
+ }
+
+ return insn;
+}
+
/* Generate the assembly code for function entry. FILE is a stdio
stream to output the code to. SIZE is an int: how many units of
temporary storage to allocate.
@@ -3182,7 +3201,7 @@ pa_output_function_prologue (file, size)
{
unsigned int old_total = total_code_bytes;

- total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
+ total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_nonnote_insn ()));
total_code_bytes += FUNCTION_BOUNDARY / BITS_PER_UNIT;

/* Be prepared to handle overflows. */
Jan Hubicka
2002-06-04 16:11:58 UTC
Permalink
Post by John David Anglin
Post by Jan Hubicka
Post by John David Anglin
total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
INSN_UID (get_last_insn ()) is 229. Something has messed up the size of
the array insn_addresses. This could be the above mentioned patch or some
The code like this can be fixed easilly by finding first/last nonnote
instruction in the chain.
I am testing the enclosed patch per your suggestion.
Post by Jan Hubicka
I am not sure whether we want to fix all incarnations in the port
dependent code or find way to iniitialize INSN_ADDRESS that can be done
eighter by moving the re-emit code before shorten_branches that has the
dwawback that existence/nonexistence of notes may result in different
code output on -g/non-g compilation or simply add the gaps into
INSN_ADDRESS array, that is also not so fortunate due to resizing
issues.
There only appears to be a couple of other ports with similar code to
determine the size of a function. Most other uses of INSN_ADDRESSES
are probably ok. Possibly, the function "get_last_nonnote_insn" could
be put in emit-rtl.c. Then, it wouldn't be too onerous to change all
the ports.
Yes, this looks like sensible approach to me. Thanks for fixing it!

Honza
Post by John David Anglin
Dave
--
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
* pa.c (get_last_nonnote_insn): New function.
(pa_output_function_prologue): Use it.
Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.167
diff -u -3 -p -r1.167 pa.c
--- config/pa/pa.c 31 May 2002 18:01:13 -0000 1.167
+++ config/pa/pa.c 4 Jun 2002 15:35:15 -0000
@@ -95,6 +95,7 @@ hppa_fpstore_bypass_p (out_insn, in_insn
#endif
#endif
+static rtx get_last_nonnote_insn PARAMS((void));
static inline rtx force_mode PARAMS ((enum machine_mode, rtx));
static void pa_combine_instructions PARAMS ((rtx));
static int pa_can_combine_p PARAMS ((rtx, rtx, rtx, int, rtx, rtx, rtx));
@@ -3116,6 +3117,24 @@ compute_frame_size (size, fregs_live)
return (fsize + STACK_BOUNDARY - 1) & ~(STACK_BOUNDARY - 1);
}
+/* Return the last nonnote insn emitted in current sequence or current
+ function. This routine looks inside SEQUENCEs. */
+
+static rtx
+get_last_nonnote_insn ()
+{
+ rtx insn = get_last_insn ();
+
+ while (insn)
+ {
+ insn = previous_insn (insn);
+ if (insn == 0 || GET_CODE (insn) != NOTE)
+ break;
+ }
+
+ return insn;
+}
+
/* Generate the assembly code for function entry. FILE is a stdio
stream to output the code to. SIZE is an int: how many units of
temporary storage to allocate.
@@ -3182,7 +3201,7 @@ pa_output_function_prologue (file, size)
{
unsigned int old_total = total_code_bytes;
- total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
+ total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_nonnote_insn ()));
total_code_bytes += FUNCTION_BOUNDARY / BITS_PER_UNIT;
/* Be prepared to handle overflows. */
John David Anglin
2002-06-06 04:30:17 UTC
Permalink
Post by Jan Hubicka
Post by John David Anglin
There only appears to be a couple of other ports with similar code to
determine the size of a function. Most other uses of INSN_ADDRESSES
are probably ok. Possibly, the function "get_last_nonnote_insn" could
be put in emit-rtl.c. Then, it wouldn't be too onerous to change all
the ports.
Yes, this looks like sensible approach to me. Thanks for fixing it!
This is the fix that I would like to apply to fix the problem of determining
the size of a function under hppa-linux and hppa64-hp-hpux11. I believe
that the patch is functionally equivalent to what I proposed before for
just the PA port. I have moved get_last_nonnote_insn to emit-rtl.c
and created a corresponding get_first_nonnote_insn for the avr port.
There is some question in my mind whether the latter question is actually
necessary but I don't want to second guess the code in avr.c.

Bootstrapped and regression tested under hppa-linux.

OK for mainline?

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-06-04 John David Anglin <***@hiauly1.hia.nrc.ca>

* emit-rtl.c (get_first_nonnote_insn, get_last_nonnote_insn): New
functions.
* rtl.h (get_first_nonnote_insn, get_last_nonnote_insn): Declare.
* avr/avr.c (avr_output_function_epilogue): Use above to determine
function size.
* pa/pa.c (pa_output_function_prologue): Likewise.

Index: emit-rtl.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/emit-rtl.c,v
retrieving revision 1.269
diff -u -3 -p -r1.269 emit-rtl.c
--- emit-rtl.c 3 Jun 2002 01:13:14 -0000 1.269
+++ emit-rtl.c 4 Jun 2002 16:53:47 -0000
@@ -2743,6 +2743,42 @@ get_last_insn_anywhere ()
return 0;
}

+/* Return the first nonnote insn emitted in current sequence or current
+ function. This routine looks inside SEQUENCEs. */
+
+rtx
+get_first_nonnote_insn ()
+{
+ rtx insn = first_insn;
+
+ while (insn)
+ {
+ insn = next_insn (insn);
+ if (insn == 0 || GET_CODE (insn) != NOTE)
+ break;
+ }
+
+ return insn;
+}
+
+/* Return the last nonnote insn emitted in current sequence or current
+ function. This routine looks inside SEQUENCEs. */
+
+rtx
+get_last_nonnote_insn ()
+{
+ rtx insn = last_insn;
+
+ while (insn)
+ {
+ insn = previous_insn (insn);
+ if (insn == 0 || GET_CODE (insn) != NOTE)
+ break;
+ }
+
+ return insn;
+}
+
/* Return a number larger than any instruction's uid in this function. */

int
Index: rtl.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/rtl.h,v
retrieving revision 1.352
diff -u -3 -p -r1.352 rtl.h
--- rtl.h 2 Jun 2002 21:09:43 -0000 1.352
+++ rtl.h 4 Jun 2002 16:53:48 -0000
@@ -1447,6 +1447,8 @@ extern rtx get_insns PARAMS ((void));
extern const char *get_insn_name PARAMS ((int));
extern rtx get_last_insn PARAMS ((void));
extern rtx get_last_insn_anywhere PARAMS ((void));
+extern rtx get_first_nonnote_insn PARAMS ((void));
+extern rtx get_last_nonnote_insn PARAMS ((void));
extern void start_sequence PARAMS ((void));
extern void push_to_sequence PARAMS ((rtx));
extern void end_sequence PARAMS ((void));
Index: config/avr/avr.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/avr/avr.c,v
retrieving revision 1.72
diff -u -3 -p -r1.72 avr.c
--- config/avr/avr.c 1 Jun 2002 23:33:47 -0000 1.72
+++ config/avr/avr.c 4 Jun 2002 16:53:48 -0000
@@ -749,8 +749,8 @@ avr_output_function_epilogue (file, size
interrupt_func_p = interrupt_function_p (current_function_decl);
signal_func_p = signal_function_p (current_function_decl);
main_p = MAIN_NAME_P (DECL_NAME (current_function_decl));
- function_size = (INSN_ADDRESSES (INSN_UID (get_last_insn ()))
- - INSN_ADDRESSES (INSN_UID (get_insns ())));
+ function_size = (INSN_ADDRESSES (INSN_UID (get_last_nonnote_insn ()))
+ - INSN_ADDRESSES (INSN_UID (get_first_nonnote_insn ())));
function_size += jump_tables_size;
live_seq = sequent_regs_live ();
minimize = (TARGET_CALL_PROLOGUES
Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.167
diff -u -3 -p -r1.167 pa.c
--- config/pa/pa.c 31 May 2002 18:01:13 -0000 1.167
+++ config/pa/pa.c 4 Jun 2002 16:53:49 -0000
@@ -3182,7 +3182,7 @@ pa_output_function_prologue (file, size)
{
unsigned int old_total = total_code_bytes;

- total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_insn ()));
+ total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_nonnote_insn ()));
total_code_bytes += FUNCTION_BOUNDARY / BITS_PER_UNIT;

/* Be prepared to handle overflows. */
l***@redhat.com
2002-06-06 06:23:10 UTC
Permalink
Post by John David Anglin
Post by Jan Hubicka
Post by John David Anglin
There only appears to be a couple of other ports with similar code to
determine the size of a function. Most other uses of INSN_ADDRESSES
are probably ok. Possibly, the function "get_last_nonnote_insn" could
be put in emit-rtl.c. Then, it wouldn't be too onerous to change all
the ports.
Yes, this looks like sensible approach to me. Thanks for fixing it!
This is the fix that I would like to apply to fix the problem of determining
the size of a function under hppa-linux and hppa64-hp-hpux11. I believe
that the patch is functionally equivalent to what I proposed before for
just the PA port. I have moved get_last_nonnote_insn to emit-rtl.c
and created a corresponding get_first_nonnote_insn for the avr port.
There is some question in my mind whether the latter question is actually
necessary but I don't want to second guess the code in avr.c.
Bootstrapped and regression tested under hppa-linux.
OK for mainline?
Dave
--
National Research Council of Canada (613) 990-0752 (FAX: 952-66
05)
* emit-rtl.c (get_first_nonnote_insn, get_last_nonnote_insn): New
functions.
* rtl.h (get_first_nonnote_insn, get_last_nonnote_insn): Declare.
* avr/avr.c (avr_output_function_epilogue): Use above to determine
function size.
* pa/pa.c (pa_output_function_prologue): Likewise.
This is fine.

Thanks,
jeff
John David Anglin
2002-06-05 21:23:10 UTC
Permalink
config.gcc into
#ifdef ABC
#define ABC foo
#endif
Oops, typo.

#ifndef ABC
#define ABC foo
#endif

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-06-12 17:03:12 UTC
Permalink
* Makefile.in (tm_defines): New configuration variable.
(cs-config.h, cs-hconfig.h, cs-tconfig.h): Rename DEFINES to XM_DEFINES.
Pass tm_defines in TM_DEFINES.
(cs-tm_p.h): Rename DEFINES to XM_DEFINES. Pass TM_DEFINES.
* config.gcc (tm_defines): New configuration variable.
(hppa*-*-* | parisc*-*-*): Use tm_defines instead of pa-700.h and
pa-7100.h headers. Change hppa1* scheduling default to 7100LC.
* configure.in: Substitute tm_defines.
* configure: Rebuilt.
* mkconfig.sh: Rename DEFINES to XM_DEFINES. Output TM_DEFINES.
* pa/pa-700.h: Delete file.
* pa/pa-7100.h: Delete file.
The above patch has not been reviewed. Could one of the build machinery
maintainers review it?

Thanks,
Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
DJ Delorie
2002-06-12 17:22:52 UTC
Permalink
Post by John David Anglin
* Makefile.in (tm_defines): New configuration variable.
(cs-config.h, cs-hconfig.h, cs-tconfig.h): Rename DEFINES to XM_DEFINES.
Pass tm_defines in TM_DEFINES.
(cs-tm_p.h): Rename DEFINES to XM_DEFINES. Pass TM_DEFINES.
* config.gcc (tm_defines): New configuration variable.
(hppa*-*-* | parisc*-*-*): Use tm_defines instead of pa-700.h and
pa-7100.h headers. Change hppa1* scheduling default to 7100LC.
* configure.in: Substitute tm_defines.
* configure: Rebuilt.
* mkconfig.sh: Rename DEFINES to XM_DEFINES. Output TM_DEFINES.
* pa/pa-700.h: Delete file.
* pa/pa-7100.h: Delete file.
The above patch has not been reviewed. Could one of the build machinery
maintainers review it?
Sorry I missed it, but the subject line was a bit misleading. The
patch is fine. However, the PPC already has a way of selecting the
default CPU (and thus scheduling etc) with a configure option and
config.gcc. Did you look at that mechanism before implementing this
one?

And I predict that at some point in the future, there will be a thread
about how all those defines in config.gcc are cluttering up the
script, and why can't we put them all in target-specific headers?
John David Anglin
2002-06-12 17:51:05 UTC
Permalink
Post by DJ Delorie
Sorry I missed it, but the subject line was a bit misleading. The
patch is fine. However, the PPC already has a way of selecting the
default CPU (and thus scheduling etc) with a configure option and
config.gcc. Did you look at that mechanism before implementing this
one?
I did but I will look more closely at what ppc has done.
Post by DJ Delorie
And I predict that at some point in the future, there will be a thread
about how all those defines in config.gcc are cluttering up the
script, and why can't we put them all in target-specific headers?
The goal here was to eliminate small target-specific headers and provide
a mechanism for selecting target options in the configuration process.
At the moment, mkconfig.sh outputs DEFINES after HEADERS. I would
argue that we should output DEFINES first so that they can be used
to select build, host and target options in HEADERS and other code
that includes the various config files. Placing the defines first
allows defaults in the headers to be overriden.

There were a number of comments after I submitted the patch that
indicated the number of target-specific headers in the i386 config
was too large and the inclusion process difficult to follow.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
DJ Delorie
2002-06-12 18:17:11 UTC
Permalink
Post by John David Anglin
that includes the various config files. Placing the defines first
allows defaults in the headers to be overriden.
Placing defines after allows you to redefine something with a simple
undef/define, without the main header needing to know that you're
doing it. There are cases that argue both ways, though.
Post by John David Anglin
There were a number of comments after I submitted the patch that
indicated the number of target-specific headers in the i386 config
was too large and the inclusion process difficult to follow.
The complexity has to go somewhere ;)
John David Anglin
2002-06-12 18:47:21 UTC
Permalink
Post by DJ Delorie
Post by John David Anglin
that includes the various config files. Placing the defines first
allows defaults in the headers to be overriden.
Placing defines after allows you to redefine something with a simple
undef/define, without the main header needing to know that you're
doing it. There are cases that argue both ways, though.
How does that work? mkconfig.sh doesn't do undef's.

Although mkconfig.sh could be modified to undef/define when necessary,
I still think it better to put the defines first because the headers
may contain secondary defines based on the identifier that you want
to redefine.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
DJ Delorie
2002-06-12 19:00:52 UTC
Permalink
Post by John David Anglin
How does that work? mkconfig.sh doesn't do undef's.
I didn't say it would work, but it's a technique we use elsewhere.
Post by John David Anglin
I still think it better to put the defines first because the headers
may contain secondary defines based on the identifier that you want
to redefine.
Consider, for example, this from i386/cygwin.h:

/* For Win32 ABI compatibility */
#undef DEFAULT_PCC_STRUCT_RETURN
#define DEFAULT_PCC_STRUCT_RETURN 0

To do this with defines before includes, you'd need to put a #ifdef
around *every single* define in *every* header file, because you won't
know which will be overridden by targets.
John David Anglin
2002-06-12 20:01:31 UTC
Permalink
Post by DJ Delorie
Post by John David Anglin
I still think it better to put the defines first because the headers
may contain secondary defines based on the identifier that you want
to redefine.
/* For Win32 ABI compatibility */
#undef DEFAULT_PCC_STRUCT_RETURN
#define DEFAULT_PCC_STRUCT_RETURN 0
To do this with defines before includes, you'd need to put a #ifdef
around *every single* define in *every* header file, because you won't
know which will be overridden by targets.
Yes. However, it's not reasonable to define the entire cygwin configuration
in configure. If configure checked how to define DEFAULT_PCC_STRUCT_RETURN
then I would say it has a place in configure, and that there should be
a way independent of include files to override the default define. I
agree that in some cases putting the defines after the headers is simpler
and avoids the ifdef burden, however you loose some flexibility in what
the headers can do.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-06-15 18:04:33 UTC
Permalink
Post by DJ Delorie
Sorry I missed it, but the subject line was a bit misleading. The
patch is fine. However, the PPC already has a way of selecting the
default CPU (and thus scheduling etc) with a configure option and
config.gcc. Did you look at that mechanism before implementing this
one?
The mechanism involves setting flag bits in TARGET_CPU_DEFAULT. This could
be done but then there would be no way for a user to fine tune the default
scheduling model selected by configure except by hacking config.gcc. One
of the goals of the patch was to enable the user to set TARGET_SCHED_DEFAULT
in BOOT_CFLAGS.

TARGET_CPU_DEFAULT involves a collection of miscellaneous flag bits that
get merged in a rather complex manner in the configuration process to produce
the define output at the beginning of the various *config.h files. This
define is not protected by a ifndef so it's not currently possible for
a user to override it in BOOT_CFLAGS.

Setting the PA architecture bits in TARGET_CPU_DEFAULT is currently a problem.
The default is still PA 1.1 even on PA 2.0 machines. I plan to make some
changes in this area in the near future after a bug in GAS affecting level
2.0 code is fixed for elf32.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-06-18 22:44:04 UTC
Permalink
Thanks. I give it a try asap. However, it looks like casesi is
broken on vax. An abort occurs in stage1 running gengenrtl.
Actually, stage2. We lose the jump table in the bbro pass.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Herman ten Brugge
2002-06-24 20:43:37 UTC
Permalink
Hello,
I notieced a problem in the current mainline release. The inlining does not
work when -O3 specified. The typo is in c-common.c (See patch below).
I do not have write permission so can not make the change after aproval.
Oops. This is wrong. The correct fix is below. I alos forgot to mention
that this did not work for the c4x target.
I do not have write permission so can not make the change after aproval.

Herman.



2002-24-06 Herman A.J. ten Brugge <***@net.HCC.nl>

* c4x.h: (TARGET_CPU_CPP_BUILTINS): Check flag_inline_functions and
flag_inline_trees to enable inlining.


--- c4x.h.org Mon Jun 24 21:37:52 2002
+++ c4x.h Mon Jun 24 21:38:20 2002
@@ -35,7 +35,8 @@
builtin_define ("_BIGMODEL"); \
if (!TARGET_MEMPARM) \
builtin_define ("_REGPARM"); \
- if (flag_inline_functions) \
+ if (flag_inline_functions \
+ || flag_inline_trees) \
builtin_define ("_INLINE"); \
if (TARGET_C3X) \
{ \
Michael Hayes
2002-06-29 07:51:50 UTC
Permalink
Post by Herman ten Brugge
Oops. This is wrong. The correct fix is below. I alos forgot to mention
that this did not work for the c4x target.
I do not have write permission so can not make the change after aproval.
I installed this for the c4x target thanks Herman.

Michael.
John David Anglin
2002-07-11 18:54:11 UTC
Permalink
After installing this patch, I realized that the predicate for the adddi3
expander can be improved rather than forcing constants that don't fit
into a register.
Here is the patch. Tested on hppa-linux, hppa2.0w-hp-hpux11.11
and hppa64-hp-hpux11.11. Applied to main. A combined version of
this patch and the previous patch has been applied to the 3.1
branch.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-07-11 John David Anglin <***@hiauly1.hia.nrc.ca>

* pa.md (adddi3): Change predicate of operand 2 to adddi3_operand
and delete code to force constant to register.
* pa-protos.h (adddi3_operand): Add prototype.
* pa.c (adddi3_operand): New function.

Index: config/pa/pa.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.md,v
retrieving revision 1.108
diff -u -3 -p -r1.108 pa.md
--- config/pa/pa.md 11 Jul 2002 05:04:54 -0000 1.108
+++ config/pa/pa.md 11 Jul 2002 15:57:30 -0000
@@ -3813,15 +3813,9 @@
(define_expand "adddi3"
[(set (match_operand:DI 0 "register_operand" "")
(plus:DI (match_operand:DI 1 "register_operand" "")
- (match_operand:DI 2 "arith_operand" "")))]
+ (match_operand:DI 2 "adddi3_operand" "")))]
""
- "
-{
- if (!TARGET_64BIT
- && GET_CODE (operands[2]) == CONST_INT
- && !VAL_11_BITS_P (INTVAL (operands[2])))
- operands[2] = force_reg (DImode, operands[2]);
-}")
+ "")

(define_insn ""
[(set (match_operand:DI 0 "register_operand" "=r")
Index: config/pa/pa-protos.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa-protos.h,v
retrieving revision 1.14
diff -u -3 -p -r1.14 pa-protos.h
--- config/pa/pa-protos.h 21 Jun 2002 01:37:47 -0000 1.14
+++ config/pa/pa-protos.h 11 Jul 2002 15:57:30 -0000
@@ -63,6 +63,7 @@ extern rtx legitimize_pic_address PARAMS
extern struct rtx_def *gen_cmp_fp PARAMS ((enum rtx_code, rtx, rtx));
extern void hppa_encode_label PARAMS ((rtx));
extern int arith11_operand PARAMS ((rtx, enum machine_mode));
+extern int adddi3_operand PARAMS ((rtx, enum machine_mode));
extern int symbolic_expression_p PARAMS ((rtx));
extern int hppa_address_cost PARAMS ((rtx));
extern int symbolic_memory_operand PARAMS ((rtx, enum machine_mode));
Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.171
diff -u -3 -p -r1.171 pa.c
--- config/pa/pa.c 11 Jul 2002 05:04:55 -0000 1.171
+++ config/pa/pa.c 11 Jul 2002 15:57:31 -0000
@@ -578,6 +578,18 @@ arith11_operand (op, mode)
|| (GET_CODE (op) == CONST_INT && INT_11_BITS (op)));
}

+/* Return truth value of whether OP can be used as an operand in a
+ adddi3 insn. */
+int
+adddi3_operand (op, mode)
+ rtx op;
+ enum machine_mode mode;
+{
+ return (register_operand (op, mode)
+ || (GET_CODE (op) == CONST_INT
+ && (TARGET_64BIT ? INT_14_BITS (op) : INT_11_BITS (op))));
+}
+
/* A constant integer suitable for use in a PRE_MODIFY memory
reference. */
int
John David Anglin
2002-08-03 05:10:33 UTC
Permalink
I belive that this patch has broken the v3 build under hppa-linux
and likely for other hppa ports. This libsupc++ library gets linked
against the share v3 lib, thus all compilations need to be pic on hppa.
I believe "-prefer-pic" was part of the C++ flags. So, we now need
it in the C flags.
Here's a quick fix. Tested on hppa-linux.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-08-02 John David Anglin <***@hiauly1.hia.nrc.ca>

* libsupc++/Makefile.am (LTCOMPILE): Add LIBSUPCXX_PICFLAGS.

Index: libsupc++/Makefile.am
===================================================================
RCS file: /cvsroot/gcc/gcc/libstdc++-v3/libsupc++/Makefile.am,v
retrieving revision 1.34
diff -u -3 -p -r1.34 Makefile.am
--- libsupc++/Makefile.am 1 Aug 2002 22:16:46 -0000 1.34
+++ libsupc++/Makefile.am 3 Aug 2002 04:29:29 -0000
@@ -126,7 +126,7 @@ dyn-string.o: dyn-string.c

# LTCOMPILE is copied from LTCXXCOMPILE below.
LTCOMPILE = $(LIBTOOL) --tag CC --tag disable-shared --mode=compile $(CC) \
- $(DEFS) $(GCC_INCLUDES) \
+ $(DEFS) $(GCC_INCLUDES) $(LIBSUPCXX_PICFLAGS) \
$(AM_CPPFLAGS) $(CPPFLAGS)
Neil Booth
2002-08-03 06:47:34 UTC
Permalink
John David Anglin wrote:-
Post by John David Anglin
Here's a quick fix. Tested on hppa-linux.
Thanks John!

Neil.
Benjamin Kosnik
2002-08-03 19:05:15 UTC
Permalink
This looks good John, please check it in.
John David Anglin
2002-08-03 22:15:24 UTC
Permalink
Post by Benjamin Kosnik
This looks good John, please check it in.
Looked at the generated Makefile.in and there are more diffs than I like.
It looks like somebody may have manually edited it or I didn't use the
correct command to rebuild it. Anybody else want to do the check in?
I have to go out now.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-08-21 16:23:50 UTC
Permalink
Installed! I suggest waiting a few days to see if the change causes
any problems before considering the branch. I will work up something
for the relevant changes.html(s).
I don't know how I missed this. The return value of remove_dup_nonsys_dirs
isn't correct if there are no system directories. The enclosed patch
fixes the problem.

Tested with a bootstrap and check on hppa-linux with no regressions.

OK for main?

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-08-21 John David Anglin <***@hiauly1.hia.nrc.ca>

* cppinit.c (remove_dup_nonsys_dirs): Fix warning and return value.

Index: cppinit.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/cppinit.c,v
retrieving revision 1.259
diff -u -3 -p -r1.259 cppinit.c
--- cppinit.c 20 Aug 2002 19:56:29 -0000 1.259
+++ cppinit.c 21 Aug 2002 16:04:16 -0000
@@ -303,12 +303,14 @@ remove_dup_nonsys_dirs (pfile, head_ptr,
struct search_path **head_ptr;
struct search_path *end;
{
- struct search_path *prev, *cur, *other;
+ int sysdir = 0;
+ struct search_path *prev = NULL, *cur, *other;

for (cur = *head_ptr; cur; cur = cur->next)
{
if (cur->sysp)
{
+ sysdir = 1;
for (other = *head_ptr, prev = NULL;
other != end;
other = other ? other->next : *head_ptr)
@@ -326,6 +328,10 @@ remove_dup_nonsys_dirs (pfile, head_ptr,
}
}
}
+
+ if (!sysdir)
+ for (cur = *head_ptr; cur != end; cur = cur->next)
+ prev = cur;

return prev;
}
Zack Weinberg
2002-08-21 16:51:41 UTC
Permalink
Post by John David Anglin
Installed! I suggest waiting a few days to see if the change causes
any problems before considering the branch. I will work up something
for the relevant changes.html(s).
I don't know how I missed this. The return value of remove_dup_nonsys_dirs
isn't correct if there are no system directories. The enclosed patch
fixes the problem.
Tested with a bootstrap and check on hppa-linux with no regressions.
OK for main?
Yes.

zw
John David Anglin
2002-08-31 15:45:13 UTC
Permalink
The following compiles your test ok. I have a full bootstrap in
progress. I'm still not completely happy with the comment but
I think it is more or less explains why we don't want SImode values
changed to a wider mode.
There is one new testsuite failure. The test execute/20010605-2.c
seg faults in function baz. It looks as if the dp register has been
messed up. We are trying to pass a complex long double.

This is the last part of the baz call:

ldd 112(%r3),%r25
ldd 120(%r3),%r26
ldd 128(%r3),%r27 <== dp clobbered
ldd 136(%r3),%r28
ldo -16(%r30),%r29
b,l baz,%r2
nop

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Steve Ellcey
2002-09-03 15:45:50 UTC
Permalink
Post by John David Anglin
ldd 112(%r3),%r25
ldd 120(%r3),%r26
ldd 128(%r3),%r27 <== dp clobbered
ldd 136(%r3),%r28
ldo -16(%r30),%r29
b,l baz,%r2
nop
Hm, I don't see this failure, but I do have a few other changes in my
tree that have not been submitted. One of them seems to be affecting
the calling sequence because I get the following sequence on the call
to baz:

fldd 32(%r3),%fr5
fldd 40(%r3),%fr6
fldd 48(%r3),%fr7
fldd 56(%r3),%fr8
ldo -16(%r30),%r29
b,l baz,%r2

I will see if I can figure out what is affecting this. It might be one
of the following macros I have set to help fix calling interactions with
the HP compiler:

#define MEMBER_TYPE_FORCES_BLK(FIELD, MODE) (1)
#define FUNCTION_ARG_REG_LITTLE_ENDIAN 1
#define PAD_VARARGS_DOWN (!AGGREGATE_TYPE_P (type))

Steve Ellcey
***@cup.hp.com
John David Anglin
2002-09-03 16:12:43 UTC
Permalink
Post by Steve Ellcey
Hm, I don't see this failure, but I do have a few other changes in my
tree that have not been submitted. One of them seems to be affecting
the calling sequence because I get the following sequence on the call
fldd 32(%r3),%fr5
fldd 40(%r3),%fr6
fldd 48(%r3),%fr7
fldd 56(%r3),%fr8
ldo -16(%r30),%r29
b,l baz,%r2
The above is wrong. The ABI specifies that TFmode parameters and
aggregates including complex numbers are always passed in general
registers. I have a fix to function_arg. This exposed another
problem in expr.c in passing TCmode complex values. I am going to
make one final small tweak and submit the patch later today. With
the complete set of patches, there are 3 fixes to the v3 suite, 8
to g77, and 10 or 11 to to gcc suite.

I have also completed checking a patch to remove the CLASS_CANNOT_CHANGE_MODE*
macros from the 32-bit port.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-09-03 19:57:53 UTC
Permalink
Here is the PA portion of a patch to implement 128-bit long doubles
for TARGET_64BIT.

CLASS_CANNOT_CHANGE_MODE_P is revised so that now the only mode changes
that are inhibited are those from SImode. This is because SImode loads
to floating-point registers do not zero-extend. I believe that this
is necessary as we do SImode loads to FPRs for the xmpy patterns.
They might happen in other rare situations as well.

Based on my reading of the "64-Bit Runtime Architecture for PA-RISC 2.0,
Version 3.3" (see pages 18 and 19), function_arg was grossly mishandling
DCmode and TCmode. I believe that these should be treated in a manner
similar to that for other aggregates (BLKmode). The same is also true
for TFmode quad-precision values. As a result, I was able to considerably
simplify the TARGET_64BIT code.

I am currently running a bootstrap and regression check. I only made
a couple of small tweaks to the code in pa.c from what I checked yesterday,
so I am not expecting any new surprises. The testsuite results improve
with this patch. I don't see any testsuite failures relating to TFmode
or complex number handling.

Let me know ASAP if you have any comments, particularly if you believe
that I have misinterpreted the ABI wrt quad-precision floating point
and complex parameters. I think that this patch is a candidate for
the branch. Although this fixes a problem in code that likely never
worked, I think it is a good idea to minimize the amount of code using
an incorrect ABI.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-09-03 John David Anglin <***@hiauly1.hia.nrc.ca>

* pa-64.h (LONG_DOUBLE_TYPE_SIZE): Define to 128.
* pa64-regs.h (CLASS_CANNOT_CHANGE_MODE_P): Inhibit changes from SImode
for floating-point register class.
* pa.c (function_arg): Fix handling of modes wider than one word for
TARGET_64BIT.

Index: config/pa/pa-64.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa-64.h,v
retrieving revision 1.10
diff -u -3 -p -r1.10 pa-64.h
--- config/pa/pa-64.h 8 May 2002 23:10:59 -0000 1.10
+++ config/pa/pa-64.h 3 Sep 2002 17:27:10 -0000
@@ -65,10 +65,8 @@ Boston, MA 02111-1307, USA. */
#define FLOAT_TYPE_SIZE 32
#undef DOUBLE_TYPE_SIZE
#define DOUBLE_TYPE_SIZE 64
-/* This should be 128, but until we work out the ABI for the 128bit
- FP codes supplied by HP we'll keep it at 64 bits. */
#undef LONG_DOUBLE_TYPE_SIZE
-#define LONG_DOUBLE_TYPE_SIZE 64
+#define LONG_DOUBLE_TYPE_SIZE 128

/* Temporary until we figure out what to do with those *(&@$ 32bit
relocs which appear in stabs. */
Index: config/pa/pa64-regs.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa64-regs.h,v
retrieving revision 1.9
diff -u -3 -p -r1.9 pa64-regs.h
--- config/pa/pa64-regs.h 11 Nov 2001 17:45:02 -0000 1.9
+++ config/pa/pa64-regs.h 3 Sep 2002 17:27:10 -0000
@@ -234,16 +234,21 @@ enum reg_class { NO_REGS, R1_REGS, GENER

/* If defined, gives a class of registers that cannot be used as the
operand of a SUBREG that changes the mode of the object illegally. */
-/* ??? This may not actually be necessary anymore. But until I can prove
- otherwise it will stay. */
+
#define CLASS_CANNOT_CHANGE_MODE (FP_REGS)

-/* Defines illegal mode changes for CLASS_CANNOT_CHANGE_MODE. */
-#define CLASS_CANNOT_CHANGE_MODE_P(FROM,TO) \
- (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))
+/* Defines illegal mode changes for CLASS_CANNOT_CHANGE_MODE.
+
+ SImode loads to floating-point registers are not zero-extended.
+ The definition for LOAD_EXTEND_OP specifies that integer loads
+ narrower than BITS_PER_WORD will be zero-extended. As a result,
+ we inhibit changes from SImode unless they are to a mode that is
+ identical in size. */
+
+#define CLASS_CANNOT_CHANGE_MODE_P(FROM,TO) \
+ ((FROM) == SImode && GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO))

-/* The same information, inverted:
- Return the class number of the smallest class containing
+/* Return the class number of the smallest class containing
reg number REGNO. This could be a conditional expression
or could index an array. */

Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.178
diff -u -3 -p -r1.178 pa.c
--- config/pa/pa.c 31 Aug 2002 19:47:07 -0000 1.178
+++ config/pa/pa.c 3 Sep 2002 17:27:22 -0000
@@ -7446,6 +7446,8 @@ function_arg (cum, mode, type, named, in
int incoming;
{
int max_arg_words = (TARGET_64BIT ? 8 : 4);
+ int arg_size = FUNCTION_ARG_SIZE (mode, type);
+ int alignment = 0;
int fpr_reg_base;
int gpr_reg_base;
rtx retval;
@@ -7456,16 +7458,15 @@ function_arg (cum, mode, type, named, in
this routine should return zero. FUNCTION_ARG_PARTIAL_NREGS will
handle arguments which are split between regs and stack slots if
the ABI mandates split arguments. */
- if (cum->words + FUNCTION_ARG_SIZE (mode, type) > max_arg_words
+ if (cum->words + arg_size > max_arg_words
|| mode == VOIDmode)
return NULL_RTX;
}
else
{
- int offset = 0;
- if (FUNCTION_ARG_SIZE (mode, type) > 1 && (cum->words & 1))
- offset = 1;
- if (cum->words + offset >= max_arg_words
+ if (arg_size > 1)
+ alignment = cum->words & 1;
+ if (cum->words + alignment >= max_arg_words
|| mode == VOIDmode)
return NULL_RTX;
}
@@ -7474,70 +7475,54 @@ function_arg (cum, mode, type, named, in
particularly in their handling of FP registers. We might
be able to cleverly share code between them, but I'm not
going to bother in the hope that splitting them up results
- in code that is more easily understood.
+ in code that is more easily understood. */

- The 64bit code probably is very wrong for structure passing. */
if (TARGET_64BIT)
{
/* Advance the base registers to their current locations.

Remember, gprs grow towards smaller register numbers while
- fprs grow to higher register numbers. Also remember FP regs
- are always 4 bytes wide, while the size of an integer register
- varies based on the size of the target word. */
+ fprs grow to higher register numbers. Also remember that
+ although FP regs are 32-bit addressable, we pretend that
+ the registers are 64-bits wide. */
gpr_reg_base = 26 - cum->words;
fpr_reg_base = 32 + cum->words;

- /* If the argument is more than a word long, then we need to align
- the base registers. Same caveats as above. */
- if (FUNCTION_ARG_SIZE (mode, type) > 1)
+ /* Arguments wider than one word need special treatment. */
+ if (arg_size > 1)
{
- if (mode != BLKmode)
- {
- /* First deal with alignment of the doubleword. */
- gpr_reg_base -= (cum->words & 1);
+ /* Double-extended precision (80-bit), quad-precision (128-bit)
+ and aggregates including complex numbers are aligned on
+ 128-bit boundaries. The first eight 64-bit argument slots
+ are associated one-to-one, with general registers r26
+ through r19, and also with floating-point registers fr4
+ through fr11. Arguments larger than one word are always
+ passed in general registers. */

- /* This seems backwards, but it is what HP specifies. We need
- gpr_reg_base to point to the smaller numbered register of
- the integer register pair. So if we have an even register
- number, then decrement the gpr base. */
- gpr_reg_base -= ((gpr_reg_base % 2) == 0);
-
- /* FP values behave sanely, except that each FP reg is only
- half of word. */
- fpr_reg_base += ((fpr_reg_base % 2) == 0);
- }
- else
+ rtx loc[8];
+ int i, offset = 0, ub = arg_size;
+
+ /* Align the base register. */
+ gpr_reg_base -= alignment;
+
+ ub = MIN (ub, max_arg_words - cum->words - alignment);
+ for (i = 0; i < ub; i++)
{
- rtx loc[8];
- int i, offset = 0, ub;
- ub = FUNCTION_ARG_SIZE (mode, type);
- ub = MIN (ub,
- MAX (0, max_arg_words - cum->words - (cum->words & 1)));
- gpr_reg_base -= (cum->words & 1);
- for (i = 0; i < ub; i++)
- {
- loc[i] = gen_rtx_EXPR_LIST (VOIDmode,
- gen_rtx_REG (DImode,
- gpr_reg_base),
- GEN_INT (offset));
- gpr_reg_base -= 1;
- offset += 8;
- }
- if (ub == 0)
- return NULL_RTX;
- else if (ub == 1)
- return XEXP (loc[0], 0);
- else
- return gen_rtx_PARALLEL (mode, gen_rtvec_v (ub, loc));
+ loc[i] = gen_rtx_EXPR_LIST (VOIDmode,
+ gen_rtx_REG (DImode, gpr_reg_base),
+ GEN_INT (offset));
+ gpr_reg_base -= 1;
+ offset += 8;
}
+
+ return gen_rtx_PARALLEL (mode, gen_rtvec_v (ub, loc));
}
}
else
{
/* If the argument is larger than a word, then we know precisely
which registers we must use. */
- if (FUNCTION_ARG_SIZE (mode, type) > 1)
+ if (arg_size > 1)
{
if (cum->words)
{
@@ -7559,19 +7544,6 @@ function_arg (cum, mode, type, named, in
}
}

- if (TARGET_64BIT && mode == TFmode)
- {
- return
- gen_rtx_PARALLEL
- (mode,
- gen_rtvec (2,
- gen_rtx_EXPR_LIST (VOIDmode,
- gen_rtx_REG (DImode, gpr_reg_base + 1),
- const0_rtx),
- gen_rtx_EXPR_LIST (VOIDmode,
- gen_rtx_REG (DImode, gpr_reg_base),
- GEN_INT (8))));
- }
/* Determine if the argument needs to be passed in both general and
floating point registers. */
if (((TARGET_PORTABLE_RUNTIME || TARGET_64BIT || TARGET_ELF32)
Jeff Law
2002-09-05 23:36:43 UTC
Permalink
Post by John David Anglin
Here is the PA portion of a patch to implement 128-bit long doubles
for TARGET_64BIT.
Great! This is something that has needed to be fixed for a long long time.
Post by John David Anglin
CLASS_CANNOT_CHANGE_MODE_P is revised so that now the only mode changes
that are inhibited are those from SImode. This is because SImode loads
to floating-point registers do not zero-extend. I believe that this
is necessary as we do SImode loads to FPRs for the xmpy patterns.
They might happen in other rare situations as well.
Yes, it can happen in other rare situations. This sounds correct to me.
Post by John David Anglin
Based on my reading of the "64-Bit Runtime Architecture for PA-RISC 2.0,
Version 3.3" (see pages 18 and 19), function_arg was grossly mishandling
DCmode and TCmode. I believe that these should be treated in a manner
similar to that for other aggregates (BLKmode). The same is also true
for TFmode quad-precision values. As a result, I was able to considerably
simplify the TARGET_64BIT code.
You're probably correct. I never got around to actually testing the
Complex modes or the 128bit FP support for PA64, so it's entirely possible
that function_arg was horribly wrong in how such arguments were handled.
Post by John David Anglin
Let me know ASAP if you have any comments, particularly if you believe
that I have misinterpreted the ABI wrt quad-precision floating point
and complex parameters.
I'm pretty sure you've got the ABI correct.
Post by John David Anglin
I think that this patch is a candidate for
the branch. Although this fixes a problem in code that likely never
worked, I think it is a good idea to minimize the amount of code using
an incorrect ABI.
My gut tells me it's a branch candidate -- PA64 GCC doesn't have a lot of
users right now and fixing these ABI problems would probably be greatly
appreciated by the few users PA64 actually has :-)


jeff
John David Anglin
2002-09-06 00:50:29 UTC
Permalink
Post by Jeff Law
Post by John David Anglin
Here is the PA portion of a patch to implement 128-bit long doubles
for TARGET_64BIT.
Great! This is something that has needed to be fixed for a long long time.
I think that I have arranged to get pa1.1 quad support from HP. I am
hoping to do PA2.0 quad support inline although Paul Bame may be able
to get the library functions for these as well. These are mainly for
linux although the hpux port would benefit from inline quad.

Thanks to suggestions from Jim Wilson and Steve Ellcey I think that
I have a handle on struct parameters. Jim provided a suggestion on
how to fix the 5-7 small struct problem on the 32 bit port. Steve
has basically provided the infrastructure to fix the passing of
structs on the 64-bit port. I need to do more testing but it looks
as if structs are now being passed as specified by the respective ABIs.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Jeff Law
2002-09-06 01:54:14 UTC
Permalink
Post by John David Anglin
glin
Post by Jeff Law
Post by John David Anglin
Here is the PA portion of a patch to implement 128-bit long doubles
for TARGET_64BIT.
Great! This is something that has needed to be fixed for a long long time.
I think that I have arranged to get pa1.1 quad support from HP. I am
hoping to do PA2.0 quad support inline although Paul Bame may be able
to get the library functions for these as well. These are mainly for
linux although the hpux port would benefit from inline quad.
Is it really a win to do them inline? Presumably you're not talking about
using the quad precision in the ISA (which no processor actually implements),
but open-coding them operation using a series of double-precision ops.
Post by John David Anglin
Thanks to suggestions from Jim Wilson and Steve Ellcey I think that
I have a handle on struct parameters. Jim provided a suggestion on
how to fix the 5-7 small struct problem on the 32 bit port. Steve
has basically provided the infrastructure to fix the passing of
structs on the 64-bit port. I need to do more testing but it looks
as if structs are now being passed as specified by the respective ABIs.
The use of PARALLELs for describing this stuff post-dates my last attempt
to fix these ABI issues for the 32bit port (circa 1995). I've never gone
back to see if the PARALLELs would actually fix the problems we were having.

Jeff
John David Anglin
2002-09-06 04:57:09 UTC
Permalink
Post by Jeff Law
Is it really a win to do them inline? Presumably you're not talking about
using the quad precision in the ISA (which no processor actually implements),
but open-coding them operation using a series of double-precision ops.
Hmmm, based on a sample of one, you appear to be correct that this
is not implemented. So, I have to agree that there is no point in doing
this inline.

Thanks,
Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-09-03 20:06:54 UTC
Permalink
This revises emit_group_load to handle splitting a TCmode source into
four DImode pieces. This is needed for the 64-bit runtime architecture
for PA-RISC 2.0. The routine should now handle an arbitrary number
of concatenated objects as long as they are all the same size.

It has been tested on hppa64-hp-hpux11.00, hppa2.0w-hp-hpux11.00,
hppa-unknown-linux-gnu and i686-pc-linux-gnu with no regressions.

Ok for main?

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-09-03 John David Anglin <***@hiauly1.hia.nrc.ca>

* expr.c (emit_group_load): Revise to allow splitting TCmode source
into DImode pieces.

Index: expr.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/expr.c,v
retrieving revision 1.481
diff -u -3 -p -r1.481 expr.c
--- expr.c 3 Sep 2002 00:33:34 -0000 1.481
+++ expr.c 3 Sep 2002 18:45:03 -0000
@@ -2265,21 +2265,26 @@ emit_group_load (dst, orig_src, ssize)
}
else if (GET_CODE (src) == CONCAT)
{
- if ((bytepos == 0
- && bytelen == GET_MODE_SIZE (GET_MODE (XEXP (src, 0))))
- || (bytepos == (HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (XEXP (src, 0)))
- && bytelen == GET_MODE_SIZE (GET_MODE (XEXP (src, 1)))))
+ unsigned int slen = GET_MODE_SIZE (GET_MODE (src));
+ unsigned int slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
+
+ if ((bytepos == 0 && bytelen == slen0)
+ || (bytepos != 0 && bytepos + bytelen <= slen))
{
- tmps[i] = XEXP (src, bytepos != 0);
+ /* The following assumes that the concatenated objects all
+ have the same size. In this case, a simple calculation
+ can be used to determine the object and the bit field
+ to be extracted. */
+ tmps[i] = XEXP (src, bytepos / slen0);
if (! CONSTANT_P (tmps[i])
&& (GET_CODE (tmps[i]) != REG || GET_MODE (tmps[i]) != mode))
tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
- 0, 1, NULL_RTX, mode, mode, ssize);
+ (bytepos % slen0) * BITS_PER_UNIT,
+ 1, NULL_RTX, mode, mode, ssize);
}
else if (bytepos == 0)
{
- rtx mem = assign_stack_temp (GET_MODE (src),
- GET_MODE_SIZE (GET_MODE (src)), 0);
+ rtx mem = assign_stack_temp (GET_MODE (src), slen, 0);
emit_move_insn (mem, src);
tmps[i] = adjust_address (mem, mode, 0);
}
Richard Henderson
2002-09-04 17:28:46 UTC
Permalink
Post by John David Anglin
* expr.c (emit_group_load): Revise to allow splitting TCmode source
into DImode pieces.
Ok.


r~
John David Anglin
2002-09-23 18:52:58 UTC
Permalink
I don't completely understand why I didn't see this earlier but
gcc.dg/typespec-1.c now fails in over a 100 places on hppa64-hp-hpux11.00.
I first observed this in a build last night, but 3.2 doesn't warn or
char short *x23;
Looking at c-decl.c, I see that explicit_char is 1 for this case, so
the change that you made doesn't generate an error for "char short".
On further investigation, it appears that grokdeclarator has been
miscompiled. For some reason, when specbits is set, it appears to
be treated as a "long" (i.e., ldd/std are used to load and store the
value). However, when it is tested, it seems to be treated as an
int (i.e., ldw/stw are used). This mixup causes the type checks
to fail.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-10-01 04:03:35 UTC
Permalink
the sibcall machinery, and is independent of it. It should
not be shut off by FUNCTION_OK_FOR_SIBCALL.
The main difference between the 32-bit ports (e.g., hppa-linux)
and hppa64-hpux is that calls on hppa64-hpux always pass a pointer
to the first argument on on the stack (arg8) as one of the arguments.
Possibly, this is the reason that there is no sibcall.
This appears to be the reason why we don't get recursive tail calls
on hppa64-hpux. sequence_uses_addressof returns nonzero because
current_function_internal_arg_pointer is found. This causes
no_sibcalls_this_function to be set. Thus, I think that all the
sibcall tests need to be xfailed for this target.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-10-21 22:48:27 UTC
Permalink
* libgcc2.c: Inline __udiv_w_sdiv when compiling __udivdi3,
__divdi3, __umoddi3, or __moddi3.
Ok.
It also fails on hppa64-hp-hpux11.11:

stage1/xgcc -Bstage1/ -B/opt/gnu64/hppa64-hp-hpux11.11/bin/ -g -O2 -mlong-calls -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wtraditional -pedantic -Wno-long-long -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -o genattrtab \
genattrtab.o genautomata.o \
rtl.o read-rtl.o bitmap.o ggc-none.o gensupport.o insn-conditions.o print-rtl1.o errors.o \
varray.o ../libiberty/libiberty.a -lm
ld: Duplicate symbol "__udiv_w_sdiv" in files stage1/libgcc.a[_udivdi3.o] and stage1/libgcc.a[_umoddi3.o]

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Ulrich Weigand
2002-10-21 23:14:35 UTC
Permalink
itimerspec/../../gcc/gcc/libgcc2.c:485: multiple definition of `__udiv_w_sdiv'
libgcc/./_divdi3.o(.text+0x0):itimerspec/../../gcc/gcc/libgcc2.c:485: first defined here
etc.
Oops, this is broken on platforms that don't define sdiv_qrnnd.

I've just checked in the following patch as obvious; could you verify
that your problems are fixed now?

Sorry,
Ulrich



ChangeLog:

* libgcc2.c: Fix __udiv_w_sdiv breakage on platforms that
don't define sdiv_qrnnd.

Index: gcc/libgcc2.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/libgcc2.c,v
retrieving revision 1.151
diff -c -p -r1.151 libgcc2.c
*** gcc/libgcc2.c 21 Oct 2002 20:25:38 -0000 1.151
--- gcc/libgcc2.c 21 Oct 2002 23:05:53 -0000
*************** __muldi3 (DWtype u, DWtype v)
*** 368,374 ****
--- 368,376 ----

#if (defined (L_udivdi3) || defined (L_divdi3) || \
defined (L_umoddi3) || defined (L_moddi3))
+ #if defined (sdiv_qrnnd)
#define L_udiv_w_sdiv
+ #endif
#endif

#ifdef L_udiv_w_sdiv
--
Dr. Ulrich Weigand
***@informatik.uni-erlangen.de
John David Anglin
2002-10-22 06:00:51 UTC
Permalink
Post by Ulrich Weigand
I've just checked in the following patch as obvious; could you verify
that your problems are fixed now?
Problems fixed.

Thanks,
Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-10-24 19:51:24 UTC
Permalink
This problem is probably hidden when the arg pointer can be
eliminated as the stack or frame pointer is used instead. However,
we can't eliminate it on hppa64 because the outgoing arg pointer in
a function call is set based on the cumulative size of the outgoing
args of the callee and this varies from one callee to another.
That should be caller, not callee.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-10-24 23:26:46 UTC
Permalink
I did see a small but consistent improvement in the running time
of genattrtab with sibcalls enabled. I don't know where this arises
but I doubt it is from any saving in the call sequence itself.
There are small differences in how the delay slot is used. For
regular calls, we use a branch and link, and for sibcalls, we just
branch. That's the sum of the differences as I see it.
I just rechecked the above. With the only change being the value for
FUNCTION_OK_FOR_SIBCALL, the times for genattrtab on an a500 running
hppa64-hp-hpux11.11 with the call rewrite were 13.24 seconds with
sibcalls and 13.71 seconds without sibcalls, respectively. These
numbers are consistent with what I measured using an earlier version
of the patch. I think the 3 percent improvement is worthwhile.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Jeff Law
2002-10-25 16:04:00 UTC
Permalink
Post by John David Anglin
I did see a small but consistent improvement in the running time
of genattrtab with sibcalls enabled. I don't know where this arises
but I doubt it is from any saving in the call sequence itself.
There are small differences in how the delay slot is used. For
regular calls, we use a branch and link, and for sibcalls, we just
branch. That's the sum of the differences as I see it.
I just rechecked the above. With the only change being the value for
FUNCTION_OK_FOR_SIBCALL, the times for genattrtab on an a500 running
hppa64-hp-hpux11.11 with the call rewrite were 13.24 seconds with
sibcalls and 13.71 seconds without sibcalls, respectively. These
numbers are consistent with what I measured using an earlier version
of the patch. I think the 3 percent improvement is worthwhile.
Definitely worthwhile.

It's possible the improvements are on the return path side -- by returning
to the caller's parent rather than the caller itself, we avoid one hard
to predict branch (the "bv" in the caller) and maybe one easy to predict
branch (branch to the epilogue).

jeff
John David Anglin
2002-10-25 17:10:13 UTC
Permalink
Post by Jeff Law
It's possible the improvements are on the return path side -- by returning
to the caller's parent rather than the caller itself, we avoid one hard
to predict branch (the "bv" in the caller) and maybe one easy to predict
branch (branch to the epilogue).
Yes, I can see that returns are hard to predict since they are indirect
and we save one with the sibcall. We definitely should implement the BTS
for returns.

I tried the same test on hppa-linux and there wasn't any difference in
the timing with and without sibcalls. I'm going to try 32-bit hpux.
Possibly, there is a linux/hpux difference. Possibly, there is a
difference wrt the P bit in the ITLB between these systems.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-10-31 03:23:50 UTC
Permalink
The following patch has been applied to the main. It contains everything
but the revised arg pointer handling proposed for hppa64. This patch fixes
a number of bugs in the handling of long calls as previously discussed
and moves the length computation for millicode and regular calls from
pa.md to pa.c.

This version has been tested on hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu.
Applied

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-10-30 John David Anglin <***@hiauly.hia.nrc.ca>

* pa-linux.h (ASM_OUTPUT_EXTERNAL_LIBCALL): Define.
* pa-protos.h (attr_length_millicode_call, attr_length_call,
pa_init_machine_status): Declare new global functions.
* pa.c (void copy_fp_args, length_fp_args, get_plabel): Declare and
implement new functions.
(attr_length_millicode_call, attr_length_call): Implement.
(total_code_bytes): Change type to long.
(pa_output_function_prologue): Compute total_code_bytes on TARGET_64BIT.
Reset counter if flag_function_sections.
(output_deferred_plabels): Set output alignment to 3 for TARGET_64BIT.
(output_cbranch): Move call to gen_label_rtx.
(output_millicode_call): Rewrite adding long TARGET_64BIT call, expose
delay slot in all variants, shorten pc-relative calls.
(output_call): Rewrite adding long TARGET_64BIT call, improved delay
slot usage and exposure, various new call variants, and shortened
sequences for some variants on TARGET_PA_20.
Miscellaneous format changes.
* pa.h (total_code_bytes): Change type to long.
(MASK_LONG_CALLS, TARGET_LONG_CALLS, TARGET_LONG_ABS_CALL,
TARGET_LONG_PIC_SDIFF_CALL, TARGET_LONG_PIC_PCREL_CALL): Define.
(TARGET_SWITCHES): Add "-mlong-calls" and "-mno-long-calls" options.
(EXTRA_CONSTRAINT, GO_IF_LEGITIMATE_ADDRESS,
LEGITIMIZE_RELOAD_ADDRESS): Don't use long floating point loads and
stores on TARGET_ELF32.
*pa.md (define_delay): Allow insns in delay on TARGET_PORTABLE_RUNTIME.
(unnamed patterns for mulsi3, divsi3, udivsi3, modsi3, umodsi3 and
canonicalize_funcptr_for_compare expanders): Calculate attribute length
attr_length_millicode_call().
(call_internal_symref, call_value_internal_symref): Clobber register 1.
Calculate attribute length using attr_length_call().
(call_internal_reg_64bit, call_value_internal_reg_64bit): Move gp load
to delay slot.
(sibcall, sibcall_value): Rewrite.
(sibcall_internal_symref, sibcall_value_internal_symref): Clobber
register 1. Use attr_length_call().
(sibcall_internal_symref_64bit, sibcall_value_internal_symref_64bit):
New patterns.
(unamed pattern for canonicalize_funcptr_for_compare): Rewrite.
* som.h (MEMBER_TYPE_FORCES_BLK): Define.
* t-pa64 (TARGET_LIBGCC2_CFLAGS): Add "-mlong-calls".
* doc/invoke.texi (mlong-calls): Document.

Index: config/pa/pa-linux.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa-linux.h,v
retrieving revision 1.26
diff -u -3 -p -r1.26 pa-linux.h
--- config/pa/pa-linux.h 3 Oct 2002 04:05:54 -0000 1.26
+++ config/pa/pa-linux.h 30 Oct 2002 17:06:37 -0000
@@ -196,6 +196,19 @@ Boston, MA 02111-1307, USA. */
} \
while (0)

+/* As well as globalizing the label, we need to encode the label
+ to ensure a plabel is generated in an indirect call. */
+
+#undef ASM_OUTPUT_EXTERNAL_LIBCALL
+#define ASM_OUTPUT_EXTERNAL_LIBCALL(FILE, FUN) \
+ do \
+ { \
+ if (!FUNCTION_NAME_P (XSTR (FUN, 0))) \
+ hppa_encode_label (FUN); \
+ (*targetm.asm_out.globalize_label) (FILE, XSTR (FUN, 0)); \
+ } \
+ while (0)
+
/* Linux always uses gas. */
#undef TARGET_GAS
#define TARGET_GAS 1
Index: config/pa/pa-protos.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa-protos.h,v
retrieving revision 1.18
diff -u -3 -p -r1.18 pa-protos.h
--- config/pa/pa-protos.h 20 Oct 2002 22:37:12 -0000 1.18
+++ config/pa/pa-protos.h 30 Oct 2002 17:06:37 -0000
@@ -105,6 +105,8 @@ extern int jump_in_call_delay PARAMS ((r
extern enum reg_class secondary_reload_class PARAMS ((enum reg_class,
enum machine_mode, rtx));
extern int hppa_fpstore_bypass_p PARAMS ((rtx, rtx));
+extern int attr_length_millicode_call PARAMS ((rtx, int));
+extern int attr_length_call PARAMS ((rtx, int));

/* Declare functions defined in pa.c and used in templates. */

Index: config/pa/pa.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.c,v
retrieving revision 1.184
diff -u -3 -p -r1.184 pa.c
--- config/pa/pa.c 22 Oct 2002 23:05:19 -0000 1.184
+++ config/pa/pa.c 30 Oct 2002 17:06:42 -0000
@@ -121,11 +121,13 @@ static void pa_globalize_label PARAMS ((
ATTRIBUTE_UNUSED;
static void pa_asm_output_mi_thunk PARAMS ((FILE *, tree, HOST_WIDE_INT,
HOST_WIDE_INT, tree));
-
+static void copy_fp_args PARAMS ((rtx)) ATTRIBUTE_UNUSED;
+static int length_fp_args PARAMS ((rtx)) ATTRIBUTE_UNUSED;
+static struct deferred_plabel *get_plabel PARAMS ((const char *))
+ ATTRIBUTE_UNUSED;

/* Save the operands last given to a compare for use when we
generate a scc or bcc insn. */
-
rtx hppa_compare_op0, hppa_compare_op1;
enum cmp_type hppa_branch_type;

@@ -149,12 +151,10 @@ static rtx find_addr_reg PARAMS ((rtx));

/* Keep track of the number of bytes we have output in the CODE subspaces
during this compilation so we'll know when to emit inline long-calls. */
-
-unsigned int total_code_bytes;
+unsigned long total_code_bytes;

/* Variables to handle plabels that we discover are necessary at assembly
output time. They are output after the current function. */
-
struct deferred_plabel GTY(())
{
rtx internal_label;
@@ -3197,14 +3197,14 @@ pa_output_function_prologue (file, size)
fputs ("\n\t.ENTRY\n", file);

/* If we're using GAS and SOM, and not using the portable runtime model,
- then we don't need to accumulate the total number of code bytes. */
+ or function sections, then we don't need to accumulate the total number
+ of code bytes. */
if ((TARGET_GAS && TARGET_SOM && ! TARGET_PORTABLE_RUNTIME)
- /* FIXME: we can't handle long calls for TARGET_64BIT. */
- || TARGET_64BIT)
+ || flag_function_sections)
total_code_bytes = 0;
else if (INSN_ADDRESSES_SET_P ())
{
- unsigned int old_total = total_code_bytes;
+ unsigned long old_total = total_code_bytes;

total_code_bytes += INSN_ADDRESSES (INSN_UID (get_last_nonnote_insn ()));
total_code_bytes += FUNCTION_BOUNDARY / BITS_PER_UNIT;
@@ -4726,6 +4726,47 @@ output_global_address (file, x, round_co
output_addr_const (file, x);
}

+static struct deferred_plabel *
+get_plabel (fname)
+ const char *fname;
+{
+ size_t i;
+
+ /* See if we have already put this function on the list of deferred
+ plabels. This list is generally small, so a liner search is not
+ too ugly. If it proves too slow replace it with something faster. */
+ for (i = 0; i < n_deferred_plabels; i++)
+ if (strcmp (fname, deferred_plabels[i].name) == 0)
+ break;
+
+ /* If the deferred plabel list is empty, or this entry was not found
+ on the list, create a new entry on the list. */
+ if (deferred_plabels == NULL || i == n_deferred_plabels)
+ {
+ const char *real_name;
+
+ if (deferred_plabels == 0)
+ deferred_plabels = (struct deferred_plabel *)
+ ggc_alloc (sizeof (struct deferred_plabel));
+ else
+ deferred_plabels = (struct deferred_plabel *)
+ ggc_realloc (deferred_plabels,
+ ((n_deferred_plabels + 1)
+ * sizeof (struct deferred_plabel)));
+
+ i = n_deferred_plabels++;
+ deferred_plabels[i].internal_label = gen_label_rtx ();
+ deferred_plabels[i].name = ggc_strdup (fname);
+
+ /* Gross. We have just implicitly taken the address of this function,
+ mark it as such. */
+ real_name = (*targetm.strip_name_encoding) (fname);
+ TREE_SYMBOL_REFERENCED (get_identifier (real_name)) = 1;
+ }
+
+ return &deferred_plabels[i];
+}
+
void
output_deferred_plabels (file)
FILE *file;
@@ -4737,7 +4778,7 @@ output_deferred_plabels (file)
if (n_deferred_plabels)
{
data_section ();
- ASM_OUTPUT_ALIGN (file, 2);
+ ASM_OUTPUT_ALIGN (file, TARGET_64BIT ? 3 : 2);
}

/* Now output the deferred plabels. */
@@ -5323,9 +5364,9 @@ hppa_va_arg (valist, type)

const char *
output_cbranch (operands, nullify, length, negated, insn)
- rtx *operands;
- int nullify, length, negated;
- rtx insn;
+ rtx *operands;
+ int nullify, length, negated;
+ rtx insn;
{
static char buf[100];
int useskip = 0;
@@ -5499,12 +5540,11 @@ output_cbranch (operands, nullify, lengt
xoperands[1] = operands[1];
xoperands[2] = operands[2];
xoperands[3] = operands[3];
- if (TARGET_SOM || ! TARGET_GAS)
- xoperands[4] = gen_label_rtx ();

output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
- if (TARGET_SOM || ! TARGET_GAS)
+ if (TARGET_SOM || !TARGET_GAS)
{
+ xoperands[4] = gen_label_rtx ();
output_asm_insn ("addil L'%l0-%l4,%%r1", xoperands);
ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
CODE_LABEL_NUMBER (xoperands[4]));
@@ -5536,10 +5576,10 @@ output_cbranch (operands, nullify, lengt

const char *
output_bb (operands, nullify, length, negated, insn, which)
- rtx *operands ATTRIBUTE_UNUSED;
- int nullify, length, negated;
- rtx insn;
- int which;
+ rtx *operands ATTRIBUTE_UNUSED;
+ int nullify, length, negated;
+ rtx insn;
+ int which;
{
static char buf[100];
int useskip = 0;
@@ -5684,10 +5724,10 @@ output_bb (operands, nullify, length, ne

const char *
output_bvb (operands, nullify, length, negated, insn, which)
- rtx *operands ATTRIBUTE_UNUSED;
- int nullify, length, negated;
- rtx insn;
- int which;
+ rtx *operands ATTRIBUTE_UNUSED;
+ int nullify, length, negated;
+ rtx insn;
+ int which;
{
static char buf[100];
int useskip = 0;
@@ -6043,442 +6083,594 @@ output_movb (operands, insn, which_alter
}
}

+/* Copy any FP arguments in INSN into integer registers. */
+static void
+copy_fp_args (insn)
+ rtx insn;
+{
+ rtx link;
+ rtx xoperands[2];

-/* INSN is a millicode call. It may have an unconditional jump in its delay
- slot.
+ for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
+ {
+ int arg_mode, regno;
+ rtx use = XEXP (link, 0);

- CALL_DEST is the routine we are calling. */
+ if (! (GET_CODE (use) == USE
+ && GET_CODE (XEXP (use, 0)) == REG
+ && FUNCTION_ARG_REGNO_P (REGNO (XEXP (use, 0)))))
+ continue;

-const char *
-output_millicode_call (insn, call_dest)
- rtx insn;
- rtx call_dest;
-{
- int attr_length = get_attr_length (insn);
- int seq_length = dbr_sequence_length ();
- int distance;
- rtx xoperands[4];
- rtx seq_insn;
+ arg_mode = GET_MODE (XEXP (use, 0));
+ regno = REGNO (XEXP (use, 0));

- xoperands[3] = gen_rtx_REG (Pmode, TARGET_64BIT ? 2 : 31);
+ /* Is it a floating point register? */
+ if (regno >= 32 && regno <= 39)
+ {
+ /* Copy the FP register into an integer register via memory. */
+ if (arg_mode == SFmode)
+ {
+ xoperands[0] = XEXP (use, 0);
+ xoperands[1] = gen_rtx_REG (SImode, 26 - (regno - 32) / 2);
+ output_asm_insn ("{fstws|fstw} %0,-16(%%sr0,%%r30)", xoperands);
+ output_asm_insn ("ldw -16(%%sr0,%%r30),%1", xoperands);
+ }
+ else
+ {
+ xoperands[0] = XEXP (use, 0);
+ xoperands[1] = gen_rtx_REG (DImode, 25 - (regno - 34) / 2);
+ output_asm_insn ("{fstds|fstd} %0,-16(%%sr0,%%r30)", xoperands);
+ output_asm_insn ("ldw -12(%%sr0,%%r30),%R1", xoperands);
+ output_asm_insn ("ldw -16(%%sr0,%%r30),%1", xoperands);
+ }
+ }
+ }
+}
+
+/* Compute length of the FP argument copy sequence for INSN. */
+static int
+length_fp_args (insn)
+ rtx insn;
+{
+ int length = 0;
+ rtx link;

- /* Handle common case -- empty delay slot or no jump in the delay slot,
- and we're sure that the branch will reach the beginning of the $CODE$
- subspace. The within reach form of the $$sh_func_adrs call has
- a length of 28 and attribute type of multi. This length is the
- same as the maximum length of an out of reach PIC call to $$div. */
- if ((seq_length == 0
- && (attr_length == 8
- || (attr_length == 28 && get_attr_type (insn) == TYPE_MULTI)))
- || (seq_length != 0
- && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN
- && attr_length == 4))
+ for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
{
- xoperands[0] = call_dest;
- output_asm_insn ("{bl|b,l} %0,%3%#", xoperands);
- return "";
+ int arg_mode, regno;
+ rtx use = XEXP (link, 0);
+
+ if (! (GET_CODE (use) == USE
+ && GET_CODE (XEXP (use, 0)) == REG
+ && FUNCTION_ARG_REGNO_P (REGNO (XEXP (use, 0)))))
+ continue;
+
+ arg_mode = GET_MODE (XEXP (use, 0));
+ regno = REGNO (XEXP (use, 0));
+
+ /* Is it a floating point register? */
+ if (regno >= 32 && regno <= 39)
+ {
+ if (arg_mode == SFmode)
+ length += 8;
+ else
+ length += 12;
+ }
}

- /* This call may not reach the beginning of the $CODE$ subspace. */
- if (attr_length > 8)
+ return length;
+}
+
+/* We include the delay slot in the returned length as it is better to
+ over estimate the length than to under estimate it. */
+
+int
+attr_length_millicode_call (insn, length)
+ rtx insn;
+ int length;
+{
+ unsigned long distance = total_code_bytes + INSN_ADDRESSES (INSN_UID (insn));
+
+ if (distance < total_code_bytes)
+ distance = -1;
+
+ if (TARGET_64BIT)
{
- int delay_insn_deleted = 0;
+ if (!TARGET_LONG_CALLS && distance < 7600000)
+ return length + 8;

- /* We need to emit an inline long-call branch. */
- if (seq_length != 0
- && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN)
- {
- /* A non-jump insn in the delay slot. By definition we can
- emit this insn before the call. */
- final_scan_insn (NEXT_INSN (insn), asm_out_file, optimize, 0, 0);
+ return length + 20;
+ }
+ else if (TARGET_PORTABLE_RUNTIME)
+ return length + 24;
+ else
+ {
+ if (!TARGET_LONG_CALLS && distance < 240000)
+ return length + 8;

- /* Now delete the delay insn. */
- PUT_CODE (NEXT_INSN (insn), NOTE);
- NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
- NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
- delay_insn_deleted = 1;
- }
+ if (TARGET_LONG_ABS_CALL && !flag_pic)
+ return length + 12;

- /* PIC long millicode call sequence. */
- if (flag_pic)
- {
- xoperands[0] = call_dest;
- if (TARGET_SOM || ! TARGET_GAS)
- xoperands[1] = gen_label_rtx ();
+ return length + 24;
+ }
+}

- /* Get our address + 8 into %r1. */
- output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+/* INSN is a function call. It may have an unconditional jump
+ in its delay slot.

- if (TARGET_SOM || ! TARGET_GAS)
- {
- /* Add %r1 to the offset of our target from the next insn. */
- output_asm_insn ("addil L%%%0-%1,%%r1", xoperands);
- ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
- CODE_LABEL_NUMBER (xoperands[1]));
- output_asm_insn ("ldo R%%%0-%1(%%r1),%%r1", xoperands);
- }
- else
- {
- output_asm_insn ("addil L%%%0-$PIC_pcrel$0+4,%%r1", xoperands);
- output_asm_insn ("ldo R%%%0-$PIC_pcrel$0+8(%%r1),%%r1",
- xoperands);
- }
+ CALL_DEST is the routine we are calling. */

- /* Get the return address into %r31. */
- output_asm_insn ("blr 0,%3", xoperands);
+const char *
+output_millicode_call (insn, call_dest)
+ rtx insn;
+ rtx call_dest;
+{
+ int attr_length = get_attr_length (insn);
+ int seq_length = dbr_sequence_length ();
+ int distance;
+ rtx seq_insn;
+ rtx xoperands[3];

- /* Branch to our target which is in %r1. */
- output_asm_insn ("bv,n %%r0(%%r1)", xoperands);
+ xoperands[0] = call_dest;
+ xoperands[2] = gen_rtx_REG (Pmode, TARGET_64BIT ? 2 : 31);

- /* Empty delay slot. Note this insn gets fetched twice and
- executed once. To be safe we use a nop. */
- output_asm_insn ("nop", xoperands);
+ /* Handle the common case where we are sure that the branch will
+ reach the beginning of the $CODE$ subspace. The within reach
+ form of the $$sh_func_adrs call has a length of 28. Because
+ it has an attribute type of multi, it never has a non-zero
+ sequence length. The length of the $$sh_func_adrs is the same
+ as certain out of reach PIC calls to other routines. */
+ if (!TARGET_LONG_CALLS
+ && ((seq_length == 0
+ && (attr_length == 12
+ || (attr_length == 28 && get_attr_type (insn) == TYPE_MULTI)))
+ || (seq_length != 0 && attr_length == 8)))
+ {
+ output_asm_insn ("{bl|b,l} %0,%2", xoperands);
+ }
+ else
+ {
+ if (TARGET_64BIT)
+ {
+ /* It might seem that one insn could be saved by accessing
+ the millicode function using the linkage table. However,
+ this doesn't work in shared libraries and other dynamically
+ loaded objects. Using a pc-relative sequence also avoids
+ problems related to the implicit use of the gp register. */
+ output_asm_insn ("b,l .+8,%%r1", xoperands);
+ output_asm_insn ("addil L'%0-$PIC_pcrel$0+4,%%r1", xoperands);
+ output_asm_insn ("ldo R'%0-$PIC_pcrel$0+8(%%r1),%%r1", xoperands);
+ output_asm_insn ("bve,l (%%r1),%%r2", xoperands);
}
- /* Pure portable runtime doesn't allow be/ble; we also don't have
- PIC support in the assembler/linker, so this sequence is needed. */
else if (TARGET_PORTABLE_RUNTIME)
{
- xoperands[0] = call_dest;
- /* Get the address of our target into %r29. */
- output_asm_insn ("ldil L%%%0,%%r29", xoperands);
- output_asm_insn ("ldo R%%%0(%%r29),%%r29", xoperands);
+ /* Pure portable runtime doesn't allow be/ble; we also don't
+ have PIC support in the assembler/linker, so this sequence
+ is needed. */
+
+ /* Get the address of our target into %r1. */
+ output_asm_insn ("ldil L'%0,%%r1", xoperands);
+ output_asm_insn ("ldo R'%0(%%r1),%%r1", xoperands);

/* Get our return address into %r31. */
- output_asm_insn ("blr %%r0,%3", xoperands);
+ output_asm_insn ("{bl|b,l} .+8,%%r31", xoperands);
+ output_asm_insn ("addi 8,%%r31,%%r31", xoperands);

- /* Jump to our target address in %r29. */
- output_asm_insn ("bv,n %%r0(%%r29)", xoperands);
-
- /* Empty delay slot. Note this insn gets fetched twice and
- executed once. To be safe we use a nop. */
- output_asm_insn ("nop", xoperands);
+ /* Jump to our target address in %r1. */
+ output_asm_insn ("bv %%r0(%%r1)", xoperands);
}
- /* If we're allowed to use be/ble instructions, then this is the
- best sequence to use for a long millicode call. */
- else
+ else if (!flag_pic)
{
- xoperands[0] = call_dest;
- output_asm_insn ("ldil L%%%0,%3", xoperands);
+ output_asm_insn ("ldil L'%0,%%r1", xoperands);
if (TARGET_PA_20)
- output_asm_insn ("be,l R%%%0(%%sr4,%3),%%sr0,%%r31", xoperands);
+ output_asm_insn ("be,l R'%0(%%sr4,%%r1),%%sr0,%%r31", xoperands);
else
- output_asm_insn ("ble R%%%0(%%sr4,%3)", xoperands);
- output_asm_insn ("nop", xoperands);
+ output_asm_insn ("ble R'%0(%%sr4,%%r1)", xoperands);
}
-
- /* If we had a jump in the call's delay slot, output it now. */
- if (seq_length != 0 && !delay_insn_deleted)
+ else
{
- xoperands[0] = XEXP (PATTERN (NEXT_INSN (insn)), 1);
- output_asm_insn ("b,n %0", xoperands);
+ if (TARGET_SOM || !TARGET_GAS)
+ {
+ /* The HP assembler can generate relocations for the
+ difference of two symbols. GAS can do this for a
+ millicode symbol but not an arbitrary external
+ symbol when generating SOM output. */
+ xoperands[1] = gen_label_rtx ();
+ output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+ output_asm_insn ("addi 16,%%r1,%%r31", xoperands);
+ ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
+ CODE_LABEL_NUMBER (xoperands[1]));
+ output_asm_insn ("addil L'%0-%l1,%%r1", xoperands);
+ output_asm_insn ("ldo R'%0-%l1(%%r1),%%r1", xoperands);
+ }
+ else
+ {
+ output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+ output_asm_insn ("addi 16,%%r1,%%r31", xoperands);
+ output_asm_insn ("addil L'%0-$PIC_pcrel$0+8,%%r1", xoperands);
+ output_asm_insn ("ldo R'%0-$PIC_pcrel$0+12(%%r1),%%r1",
+ xoperands);
+ }

- /* Now delete the delay insn. */
- PUT_CODE (NEXT_INSN (insn), NOTE);
- NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
- NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+ /* Jump to our target address in %r1. */
+ output_asm_insn ("bv %%r0(%%r1)", xoperands);
}
- return "";
}

- /* This call has an unconditional jump in its delay slot and the
- call is known to reach its target or the beginning of the current
- subspace. */
-
- /* Use the containing sequence insn's address. */
- seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
+ if (seq_length == 0)
+ output_asm_insn ("nop", xoperands);

- distance = INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
- - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8;
+ /* We are done if there isn't a jump in the delay slot. */
+ if (seq_length == 0 || GET_CODE (NEXT_INSN (insn)) != JUMP_INSN)
+ return "";

- /* If the branch was too far away, emit a normal call followed
- by a nop, followed by the unconditional branch.
+ /* This call has an unconditional jump in its delay slot. */
+ xoperands[0] = XEXP (PATTERN (NEXT_INSN (insn)), 1);

- If the branch is close, then adjust %r2 from within the
- call's delay slot. */
+ /* See if the return address can be adjusted. Use the containing
+ sequence insn's address. */
+ seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
+ distance = (INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
+ - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8);

- xoperands[0] = call_dest;
- xoperands[1] = XEXP (PATTERN (NEXT_INSN (insn)), 1);
- if (! VAL_14_BITS_P (distance))
- output_asm_insn ("{bl|b,l} %0,%3\n\tnop\n\tb,n %1", xoperands);
- else
+ if (VAL_14_BITS_P (distance))
{
- xoperands[2] = gen_label_rtx ();
- output_asm_insn ("\n\t{bl|b,l} %0,%3\n\tldo %1-%2(%3),%3",
- xoperands);
+ xoperands[1] = gen_label_rtx ();
+ output_asm_insn ("ldo %0-%1(%2),%2", xoperands);
ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
- CODE_LABEL_NUMBER (xoperands[2]));
+ CODE_LABEL_NUMBER (xoperands[3]));
}
+ else
+ /* ??? This branch may not reach its target. */
+ output_asm_insn ("nop\n\tb,n %0", xoperands);

/* Delete the jump. */
PUT_CODE (NEXT_INSN (insn), NOTE);
NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+
return "";
}

-/* INSN is either a function call. It may have an unconditional jump
+/* We include the delay slot in the returned length as it is better to
+ over estimate the length than to under estimate it. */
+
+int
+attr_length_call (insn, sibcall)
+ rtx insn;
+ int sibcall;
+{
+ unsigned long distance = total_code_bytes + INSN_ADDRESSES (INSN_UID (insn));
+
+ if (distance < total_code_bytes)
+ distance = -1;
+
+ if (TARGET_64BIT)
+ {
+ if (!TARGET_LONG_CALLS
+ && ((!sibcall && distance < 7600000) || distance < 240000))
+ return 8;
+
+ return (sibcall ? 28 : 24);
+ }
+ else
+ {
+ if (!TARGET_LONG_CALLS
+ && ((TARGET_PA_20 && !sibcall && distance < 7600000)
+ || distance < 240000))
+ return 8;
+
+ if (TARGET_LONG_ABS_CALL && !flag_pic)
+ return 12;
+
+ if ((TARGET_SOM && TARGET_LONG_PIC_SDIFF_CALL)
+ || (TARGET_GAS && TARGET_LONG_PIC_PCREL_CALL))
+ {
+ if (TARGET_PA_20)
+ return 20;
+
+ return 28;
+ }
+ else
+ {
+ int length = 0;
+
+ if (TARGET_SOM)
+ length += length_fp_args (insn);
+
+ if (flag_pic)
+ length += 4;
+
+ if (TARGET_PA_20)
+ return (length + 32);
+
+ if (!sibcall)
+ length += 8;
+
+ return (length + 40);
+ }
+ }
+}
+
+/* INSN is a function call. It may have an unconditional jump
in its delay slot.

CALL_DEST is the routine we are calling. */

const char *
output_call (insn, call_dest, sibcall)
- rtx insn;
- rtx call_dest;
- int sibcall;
+ rtx insn;
+ rtx call_dest;
+ int sibcall;
{
+ int delay_insn_deleted = 0;
+ int delay_slot_filled = 0;
int attr_length = get_attr_length (insn);
int seq_length = dbr_sequence_length ();
- int distance;
- rtx xoperands[4];
- rtx seq_insn;
+ rtx xoperands[2];
+
+ xoperands[0] = call_dest;

- /* Handle common case -- empty delay slot or no jump in the delay slot,
- and we're sure that the branch will reach the beginning of the $CODE$
- subspace. */
- if ((seq_length == 0 && attr_length == 12)
- || (seq_length != 0
- && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN
- && attr_length == 8))
+ /* Handle the common case where we're sure that the branch will reach
+ the beginning of the $CODE$ subspace. */
+ if (!TARGET_LONG_CALLS
+ && ((seq_length == 0 && attr_length == 12)
+ || (seq_length != 0 && attr_length == 8)))
{
- xoperands[0] = call_dest;
xoperands[1] = gen_rtx_REG (word_mode, sibcall ? 0 : 2);
- output_asm_insn ("{bl|b,l} %0,%1%#", xoperands);
- return "";
+ output_asm_insn ("{bl|b,l} %0,%1", xoperands);
}
-
- /* This call may not reach the beginning of the $CODE$ subspace. */
- if (attr_length > 12)
+ else
{
- int delay_insn_deleted = 0;
- rtx xoperands[2];
- rtx link;
-
- /* We need to emit an inline long-call branch. Furthermore,
- because we're changing a named function call into an indirect
- function call well after the parameters have been set up, we
- need to make sure any FP args appear in both the integer
- and FP registers. Also, we need move any delay slot insn
- out of the delay slot. And finally, we can't rely on the linker
- being able to fix the call to $$dyncall! -- Yuk!. */
- if (seq_length != 0
- && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN)
- {
- /* A non-jump insn in the delay slot. By definition we can
- emit this insn before the call (and in fact before argument
- relocating. */
- final_scan_insn (NEXT_INSN (insn), asm_out_file, optimize, 0, 0);
-
- /* Now delete the delay insn. */
- PUT_CODE (NEXT_INSN (insn), NOTE);
- NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
- NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
- delay_insn_deleted = 1;
- }
-
- /* Now copy any FP arguments into integer registers. */
- for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
- {
- int arg_mode, regno;
- rtx use = XEXP (link, 0);
- if (! (GET_CODE (use) == USE
- && GET_CODE (XEXP (use, 0)) == REG
- && FUNCTION_ARG_REGNO_P (REGNO (XEXP (use, 0)))))
- continue;
-
- arg_mode = GET_MODE (XEXP (use, 0));
- regno = REGNO (XEXP (use, 0));
- /* Is it a floating point register? */
- if (regno >= 32 && regno <= 39)
- {
- /* Copy from the FP register into an integer register
- (via memory). */
- if (arg_mode == SFmode)
- {
- xoperands[0] = XEXP (use, 0);
- xoperands[1] = gen_rtx_REG (SImode, 26 - (regno - 32) / 2);
- output_asm_insn ("{fstws|fstw} %0,-16(%%sr0,%%r30)",
- xoperands);
- output_asm_insn ("ldw -16(%%sr0,%%r30),%1", xoperands);
- }
- else
- {
- xoperands[0] = XEXP (use, 0);
- xoperands[1] = gen_rtx_REG (DImode, 25 - (regno - 34) / 2);
- output_asm_insn ("{fstds|fstd} %0,-16(%%sr0,%%r30)",
- xoperands);
- output_asm_insn ("ldw -12(%%sr0,%%r30),%R1", xoperands);
- output_asm_insn ("ldw -16(%%sr0,%%r30),%1", xoperands);
- }
+ if (TARGET_64BIT)
+ {
+ /* ??? As far as I can tell, the HP linker doesn't support the
+ long pc-relative sequence described in the 64-bit runtime
+ architecture. So, we use a slightly longer indirect call. */
+ struct deferred_plabel *p = get_plabel (XSTR (call_dest, 0));
+
+ xoperands[0] = p->internal_label;
+ xoperands[1] = gen_label_rtx ();
+
+ /* If this isn't a sibcall, we put the load of %r27 into the
+ delay slot. We can't do this in a sibcall as we don't
+ have a second call-clobbered scratch register available. */
+ if (seq_length != 0
+ && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN
+ && !sibcall)
+ {
+ final_scan_insn (NEXT_INSN (insn), asm_out_file,
+ optimize, 0, 0);
+
+ /* Now delete the delay insn. */
+ PUT_CODE (NEXT_INSN (insn), NOTE);
+ NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
+ NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+ delay_insn_deleted = 1;
+ }
+
+ output_asm_insn ("addil LT'%0,%%r27", xoperands);
+ output_asm_insn ("ldd RT'%0(%%r1),%%r1", xoperands);
+ output_asm_insn ("ldd 0(%%r1),%%r1", xoperands);
+
+ if (sibcall)
+ {
+ output_asm_insn ("ldd 24(%%r1),%%r27", xoperands);
+ output_asm_insn ("ldd 16(%%r1),%%r1", xoperands);
+ output_asm_insn ("bve (%%r1)", xoperands);
+ }
+ else
+ {
+ output_asm_insn ("ldd 16(%%r1),%%r2", xoperands);
+ output_asm_insn ("bve,l (%%r2),%%r2", xoperands);
+ output_asm_insn ("ldd 24(%%r1),%%r27", xoperands);
+ delay_slot_filled = 1;
}
}
-
- /* Don't have to worry about TARGET_PORTABLE_RUNTIME here since
- we don't have any direct calls in that case. */
+ else
{
- size_t i;
- const char *name = XSTR (call_dest, 0);
+ int indirect_call = 0;

- /* See if we have already put this function on the list
- of deferred plabels. This list is generally small,
- so a liner search is not too ugly. If it proves too
- slow replace it with something faster. */
- for (i = 0; i < n_deferred_plabels; i++)
- if (strcmp (name, deferred_plabels[i].name) == 0)
- break;
-
- /* If the deferred plabel list is empty, or this entry was
- not found on the list, create a new entry on the list. */
- if (deferred_plabels == NULL || i == n_deferred_plabels)
- {
- const char *real_name;
-
- if (deferred_plabels == 0)
- deferred_plabels = (struct deferred_plabel *)
- ggc_alloc (sizeof (struct deferred_plabel));
+ /* Emit a long call. There are several different sequences
+ of increasing length and complexity. In most cases,
+ they don't allow an instruction in the delay slot. */
+ if (!(TARGET_LONG_ABS_CALL && !flag_pic)
+ && !(TARGET_SOM && TARGET_LONG_PIC_SDIFF_CALL)
+ && !(TARGET_GAS && TARGET_LONG_PIC_PCREL_CALL))
+ indirect_call = 1;
+
+ if (seq_length != 0
+ && GET_CODE (NEXT_INSN (insn)) != JUMP_INSN
+ && !sibcall
+ && (!TARGET_PA_20 || indirect_call))
+ {
+ /* A non-jump insn in the delay slot. By definition we can
+ emit this insn before the call (and in fact before argument
+ relocating. */
+ final_scan_insn (NEXT_INSN (insn), asm_out_file, optimize, 0, 0);
+
+ /* Now delete the delay insn. */
+ PUT_CODE (NEXT_INSN (insn), NOTE);
+ NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
+ NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+ delay_insn_deleted = 1;
+ }
+
+ if (TARGET_LONG_ABS_CALL && !flag_pic)
+ {
+ /* This is the best sequence for making long calls in
+ non-pic code. Unfortunately, GNU ld doesn't provide
+ the stub needed for external calls, and GAS's support
+ for this with the SOM linker is buggy. */
+ output_asm_insn ("ldil L'%0,%%r1", xoperands);
+ if (sibcall)
+ output_asm_insn ("be R'%0(%%sr4,%%r1)", xoperands);
else
- deferred_plabels = (struct deferred_plabel *)
- ggc_realloc (deferred_plabels,
- ((n_deferred_plabels + 1)
- * sizeof (struct deferred_plabel)));
-
- i = n_deferred_plabels++;
- deferred_plabels[i].internal_label = gen_label_rtx ();
- deferred_plabels[i].name = ggc_strdup (name);
-
- /* Gross. We have just implicitly taken the address of this
- function, mark it as such. */
- real_name = (*targetm.strip_name_encoding) (name);
- TREE_SYMBOL_REFERENCED (get_identifier (real_name)) = 1;
- }
-
- /* We have to load the address of the function using a procedure
- label (plabel). Inline plabels can lose for PIC and other
- cases, so avoid them by creating a 32bit plabel in the data
- segment. */
- if (flag_pic)
- {
- xoperands[0] = deferred_plabels[i].internal_label;
- if (TARGET_SOM || ! TARGET_GAS)
- xoperands[1] = gen_label_rtx ();
-
- output_asm_insn ("addil LT%%%0,%%r19", xoperands);
- output_asm_insn ("ldw RT%%%0(%%r1),%%r22", xoperands);
- output_asm_insn ("ldw 0(%%r22),%%r22", xoperands);
-
- /* Get our address + 8 into %r1. */
- output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+ {
+ if (TARGET_PA_20)
+ output_asm_insn ("be,l R'%0(%%sr4,%%r1),%%sr0,%%r31",
+ xoperands);
+ else
+ output_asm_insn ("ble R'%0(%%sr4,%%r1)", xoperands);

- if (TARGET_SOM || ! TARGET_GAS)
+ output_asm_insn ("copy %%r31,%%r2", xoperands);
+ delay_slot_filled = 1;
+ }
+ }
+ else
+ {
+ if (TARGET_SOM && TARGET_LONG_PIC_SDIFF_CALL)
{
- /* Add %r1 to the offset of dyncall from the next insn. */
- output_asm_insn ("addil L%%$$dyncall-%1,%%r1", xoperands);
+ /* The HP assembler and linker can handle relocations
+ for the difference of two symbols. GAS and the HP
+ linker can't do this when one of the symbols is
+ external. */
+ xoperands[1] = gen_label_rtx ();
+ output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+ output_asm_insn ("addil L'%0-%l1,%%r1", xoperands);
ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
CODE_LABEL_NUMBER (xoperands[1]));
- output_asm_insn ("ldo R%%$$dyncall-%1(%%r1),%%r1", xoperands);
- }
- else
+ output_asm_insn ("ldo R'%0-%l1(%%r1),%%r1", xoperands);
+ }
+ else if (TARGET_GAS && TARGET_LONG_PIC_PCREL_CALL)
{
- output_asm_insn ("addil L%%$$dyncall-$PIC_pcrel$0+4,%%r1",
+ /* GAS currently can't generate the relocations that
+ are needed for the SOM linker under HP-UX using this
+ sequence. The GNU linker doesn't generate the stubs
+ that are needed for external calls on TARGET_ELF32
+ with this sequence. For now, we have to use a
+ longer plabel sequence when using GAS. */
+ output_asm_insn ("{bl|b,l} .+8,%%r1", xoperands);
+ output_asm_insn ("addil L'%0-$PIC_pcrel$0+4,%%r1",
xoperands);
- output_asm_insn ("ldo R%%$$dyncall-$PIC_pcrel$0+8(%%r1),%%r1",
+ output_asm_insn ("ldo R'%0-$PIC_pcrel$0+8(%%r1),%%r1",
xoperands);
}
+ else
+ {
+ /* Emit a long plabel-based call sequence. This is
+ essentially an inline implementation of $$dyncall.
+ We don't actually try to call $$dyncall as this is
+ as difficult as calling the function itself. */
+ struct deferred_plabel *p = get_plabel (XSTR (call_dest, 0));
+
+ xoperands[0] = p->internal_label;
+ xoperands[1] = gen_label_rtx ();
+
+ /* Since the call is indirect, FP arguments in registers
+ need to be copied to the general registers. Then, the
+ argument relocation stub will copy them back. */
+ if (TARGET_SOM)
+ copy_fp_args (insn);

- /* Get the return address into %r31. */
- output_asm_insn ("blr %%r0,%%r31", xoperands);
+ if (flag_pic)
+ {
+ output_asm_insn ("addil LT'%0,%%r19", xoperands);
+ output_asm_insn ("ldw RT'%0(%%r1),%%r1", xoperands);
+ output_asm_insn ("ldw 0(%%r1),%%r1", xoperands);
+ }
+ else
+ {
+ output_asm_insn ("addil LR'%0-$global$,%%r27",
+ xoperands);
+ output_asm_insn ("ldw RR'%0-$global$(%%r1),%%r1",
+ xoperands);
+ }

- /* Branch to our target which is in %r1. */
- output_asm_insn ("bv %%r0(%%r1)", xoperands);
+ output_asm_insn ("bb,>=,n %%r1,30,.+16", xoperands);
+ output_asm_insn ("depi 0,31,2,%%r1", xoperands);
+ output_asm_insn ("ldw 4(%%sr0,%%r1),%%r19", xoperands);
+ output_asm_insn ("ldw 0(%%sr0,%%r1),%%r1", xoperands);

- if (sibcall)
- {
- /* This call never returns, so we do not need to fix the
- return pointer. */
- output_asm_insn ("nop", xoperands);
- }
- else
- {
- /* Copy the return address into %r2 also. */
- output_asm_insn ("copy %%r31,%%r2", xoperands);
+ if (!sibcall && !TARGET_PA_20)
+ {
+ output_asm_insn ("{bl|b,l} .+8,%%r2", xoperands);
+ output_asm_insn ("addi 16,%%r2,%%r2", xoperands);
+ }
}
- }
- else
- {
- xoperands[0] = deferred_plabels[i].internal_label;

- /* Get the address of our target into %r22. */
- output_asm_insn ("addil LR%%%0-$global$,%%r27", xoperands);
- output_asm_insn ("ldw RR%%%0-$global$(%%r1),%%r22", xoperands);
-
- /* Get the high part of the address of $dyncall into %r2, then
- add in the low part in the branch instruction. */
- output_asm_insn ("ldil L%%$$dyncall,%%r2", xoperands);
if (TARGET_PA_20)
- output_asm_insn ("be,l R%%$$dyncall(%%sr4,%%r2),%%sr0,%%r31",
- xoperands);
- else
- output_asm_insn ("ble R%%$$dyncall(%%sr4,%%r2)", xoperands);
-
- if (sibcall)
{
- /* This call never returns, so we do not need to fix the
- return pointer. */
- output_asm_insn ("nop", xoperands);
+ if (sibcall)
+ output_asm_insn ("bve (%%r1)", xoperands);
+ else
+ {
+ if (indirect_call)
+ {
+ output_asm_insn ("bve,l (%%r1),%%r2", xoperands);
+ output_asm_insn ("stw %%r2,-24(%%sp)", xoperands);
+ delay_slot_filled = 1;
+ }
+ else
+ output_asm_insn ("bve,l (%%r1),%%r2", xoperands);
+ }
}
else
{
- /* Copy the return address into %r2 also. */
- output_asm_insn ("copy %%r31,%%r2", xoperands);
- }
- }
- }
+ output_asm_insn ("ldsid (%%r1),%%r31\n\tmtsp %%r31,%%sr0",
+ xoperands);

- /* If we had a jump in the call's delay slot, output it now. */
- if (seq_length != 0 && !delay_insn_deleted)
- {
- xoperands[0] = XEXP (PATTERN (NEXT_INSN (insn)), 1);
- output_asm_insn ("b,n %0", xoperands);
+ if (sibcall)
+ output_asm_insn ("be 0(%%sr0,%%r1)", xoperands);
+ else
+ {
+ output_asm_insn ("ble 0(%%sr0,%%r1)", xoperands);

- /* Now delete the delay insn. */
- PUT_CODE (NEXT_INSN (insn), NOTE);
- NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
- NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+ if (indirect_call)
+ output_asm_insn ("stw %%r31,-24(%%sp)", xoperands);
+ else
+ output_asm_insn ("copy %%r31,%%r2", xoperands);
+ delay_slot_filled = 1;
+ }
+ }
+ }
}
- return "";
}

- /* This call has an unconditional jump in its delay slot and the
- call is known to reach its target or the beginning of the current
- subspace. */
+ if (seq_length == 0 || (delay_insn_deleted && !delay_slot_filled))
+ output_asm_insn ("nop", xoperands);

- /* Use the containing sequence insn's address. */
- seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
+ /* We are done if there isn't a jump in the delay slot. */
+ if (seq_length == 0
+ || delay_insn_deleted
+ || GET_CODE (NEXT_INSN (insn)) != JUMP_INSN)
+ return "";

- distance = INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
- - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8;
+ /* A sibcall should never have a branch in the delay slot. */
+ if (sibcall)
+ abort ();

- /* If the branch is too far away, emit a normal call followed
- by a nop, followed by the unconditional branch. If the branch
- is close, then adjust %r2 in the call's delay slot. */
+ /* This call has an unconditional jump in its delay slot. */
+ xoperands[0] = XEXP (PATTERN (NEXT_INSN (insn)), 1);

- xoperands[0] = call_dest;
- xoperands[1] = XEXP (PATTERN (NEXT_INSN (insn)), 1);
- if (! VAL_14_BITS_P (distance))
- output_asm_insn ("{bl|b,l} %0,%%r2\n\tnop\n\tb,n %1", xoperands);
- else
+ if (!delay_slot_filled)
{
- xoperands[3] = gen_label_rtx ();
- output_asm_insn ("\n\t{bl|b,l} %0,%%r2\n\tldo %1-%3(%%r2),%%r2",
- xoperands);
- ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
- CODE_LABEL_NUMBER (xoperands[3]));
+ /* See if the return address can be adjusted. Use the containing
+ sequence insn's address. */
+ rtx seq_insn = NEXT_INSN (PREV_INSN (XVECEXP (final_sequence, 0, 0)));
+ int distance = (INSN_ADDRESSES (INSN_UID (JUMP_LABEL (NEXT_INSN (insn))))
+ - INSN_ADDRESSES (INSN_UID (seq_insn)) - 8);
+
+ if (VAL_14_BITS_P (distance))
+ {
+ xoperands[1] = gen_label_rtx ();
+ output_asm_insn ("ldo %0-%1(%%r2),%%r2", xoperands);
+ ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, "L",
+ CODE_LABEL_NUMBER (xoperands[3]));
+ }
+ else
+ /* ??? This branch may not reach its target. */
+ output_asm_insn ("nop\n\tb,n %0", xoperands);
}
+ else
+ /* ??? This branch may not reach its target. */
+ output_asm_insn ("b,n %0", xoperands);

/* Delete the jump. */
PUT_CODE (NEXT_INSN (insn), NOTE);
NOTE_LINE_NUMBER (NEXT_INSN (insn)) = NOTE_INSN_DELETED;
NOTE_SOURCE_FILE (NEXT_INSN (insn)) = 0;
+
return "";
}

@@ -6580,8 +6772,8 @@ pa_asm_output_mi_thunk (file, thunk_fnde
{
if (! TARGET_64BIT && ! TARGET_PORTABLE_RUNTIME && flag_pic)
{
- fprintf (file, "\taddil LT%%%s,%%r19\n", lab);
- fprintf (file, "\tldw RT%%%s(%%r1),%%r22\n", lab);
+ fprintf (file, "\taddil LT'%s,%%r19\n", lab);
+ fprintf (file, "\tldw RT'%s(%%r1),%%r22\n", lab);
fprintf (file, "\tldw 0(%%sr0,%%r22),%%r22\n");
fprintf (file, "\tbb,>=,n %%r22,30,.+16\n");
fprintf (file, "\tdepi 0,31,2,%%r22\n");
@@ -6603,13 +6795,13 @@ pa_asm_output_mi_thunk (file, thunk_fnde
{
if (! TARGET_64BIT && ! TARGET_PORTABLE_RUNTIME && flag_pic)
{
- fprintf (file, "\taddil L%%");
+ fprintf (file, "\taddil L'");
fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta);
- fprintf (file, ",%%r26\n\tldo R%%");
+ fprintf (file, ",%%r26\n\tldo R'");
fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta);
fprintf (file, "(%%r1),%%r26\n");
- fprintf (file, "\taddil LT%%%s,%%r19\n", lab);
- fprintf (file, "\tldw RT%%%s(%%r1),%%r22\n", lab);
+ fprintf (file, "\taddil LT'%s,%%r19\n", lab);
+ fprintf (file, "\tldw RT'%s(%%r1),%%r22\n", lab);
fprintf (file, "\tldw 0(%%sr0,%%r22),%%r22\n");
fprintf (file, "\tbb,>=,n %%r22,30,.+16\n");
fprintf (file, "\tdepi 0,31,2,%%r22\n");
@@ -6620,9 +6812,9 @@ pa_asm_output_mi_thunk (file, thunk_fnde
}
else
{
- fprintf (file, "\taddil L%%");
+ fprintf (file, "\taddil L'");
fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta);
- fprintf (file, ",%%r26\n\tb %s\n\tldo R%%", target_name);
+ fprintf (file, ",%%r26\n\tb %s\n\tldo R'", target_name);
fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta);
fprintf (file, "(%%r1),%%r26\n");
}
@@ -6634,7 +6826,7 @@ pa_asm_output_mi_thunk (file, thunk_fnde
data_section ();
fprintf (file, "\t.align 4\n");
ASM_OUTPUT_INTERNAL_LABEL (file, "LTHN", current_thunk_number);
- fprintf (file, "\t.word P%%%s\n", target_name);
+ fprintf (file, "\t.word P'%s\n", target_name);
function_section (thunk_fndecl);
}
current_thunk_number++;
Index: config/pa/pa.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.h,v
retrieving revision 1.173
diff -u -3 -p -r1.173 pa.h
--- config/pa/pa.h 20 Oct 2002 22:37:12 -0000 1.173
+++ config/pa/pa.h 30 Oct 2002 17:06:44 -0000
@@ -31,7 +31,7 @@ enum cmp_type /* comparison type */
};

/* For long call handling. */
-extern unsigned int total_code_bytes;
+extern unsigned long total_code_bytes;

/* Which processor to schedule for. */

@@ -152,6 +152,12 @@ extern int target_flags;
#define TARGET_GNU_LD (target_flags & MASK_GNU_LD)
#endif

+/* Force generation of long calls. */
+#define MASK_LONG_CALLS 32768
+#ifndef TARGET_LONG_CALLS
+#define TARGET_LONG_CALLS (target_flags & MASK_LONG_CALLS)
+#endif
+
#ifndef TARGET_PA_10
#define TARGET_PA_10 (target_flags & (MASK_PA_11 | MASK_PA_20) == 0)
#endif
@@ -179,6 +185,27 @@ extern int target_flags;
#define TARGET_SOM 0
#endif

+/* The following three defines are potential target switches. The current
+ defines are optimal given the current capabilities of GAS and GNU ld. */
+
+/* Define to a C expression evaluating to true to use long absolute calls.
+ Currently, only the HP assembler and SOM linker support long absolute
+ calls. They are used only in non-pic code. */
+#define TARGET_LONG_ABS_CALL (TARGET_SOM && !TARGET_GAS)
+
+/* Define to a C expression evaluating to true to use long pic symbol
+ difference calls. This is a call variant similar to the long pic
+ pc-relative call. Long pic symbol difference calls are only used with
+ the HP SOM linker. Currently, only the HP assembler supports these
+ calls. GAS doesn't allow an arbritrary difference of two symbols. */
+#define TARGET_LONG_PIC_SDIFF_CALL (!TARGET_GAS)
+
+/* Define to a C expression evaluating to true to use long pic
+ pc-relative calls. Long pic pc-relative calls are only used with
+ GAS. Currently, they are usable for calls within a module but
+ not for external calls. */
+#define TARGET_LONG_PIC_PCREL_CALL 0
+
/* Macro to define tables used to set the flags. This is a
list in braces of target switches with each switch being
{ "NAME", VALUE, "HELP_STRING" }. VALUE is the bits to set,
@@ -237,6 +264,10 @@ extern int target_flags;
N_("Generate code for huge switch statements") }, \
{ "no-big-switch", -MASK_BIG_SWITCH, \
N_("Do not generate code for huge switch statements") }, \
+ { "long-calls", MASK_LONG_CALLS, \
+ N_("Always generate long calls") }, \
+ { "no-long-calls", -MASK_LONG_CALLS, \
+ N_("Generate long calls only when needed") }, \
{ "linker-opt", 0, \
N_("Enable linker optimizations") }, \
SUBTARGET_SWITCHES \
@@ -1193,8 +1224,14 @@ extern int may_call_alloca;
/* Using DFmode forces only short displacements \
to be recognized as valid in reg+d addresses. \
However, this is not necessary for PA2.0 since\
- it has long FP loads/stores. */ \
+ it has long FP loads/stores. \
+ \
+ FIXME: the ELF32 linker clobbers the LSB of \
+ the FP register number in {fldw,fstw} insns. \
+ Thus, we only allow long FP loads/stores on \
+ TARGET_64BIT. */ \
&& memory_address_p ((TARGET_PA_20 \
+ && !TARGET_ELF32 \
? GET_MODE (OP) \
: DFmode), \
XEXP (OP, 0)) \
@@ -1300,7 +1337,7 @@ extern int may_call_alloca;
if (GET_CODE (index) == CONST_INT \
&& ((INT_14_BITS (index) \
&& (TARGET_SOFT_FLOAT \
- || (TARGET_PA_20 \
+ || (TARGET_PA_20 \
&& ((MODE == SFmode \
&& (INTVAL (index) % 4) == 0)\
|| (MODE == DFmode \
@@ -1327,6 +1364,7 @@ extern int may_call_alloca;
/* We can allow symbolic LO_SUM addresses\
for PA2.0. */ \
|| (TARGET_PA_20 \
+ && !TARGET_ELF32 \
&& GET_CODE (XEXP (X, 1)) != CONST_INT)\
|| ((MODE) != SFmode \
&& (MODE) != DFmode))) \
@@ -1340,6 +1378,7 @@ extern int may_call_alloca;
/* We can allow symbolic LO_SUM addresses\
for PA2.0. */ \
|| (TARGET_PA_20 \
+ && !TARGET_ELF32 \
&& GET_CODE (XEXP (X, 1)) != CONST_INT)\
|| ((MODE) != SFmode \
&& (MODE) != DFmode))) \
@@ -1354,7 +1393,7 @@ extern int may_call_alloca;
&& REG_OK_FOR_BASE_P (XEXP (X, 0)) \
&& GET_CODE (XEXP (X, 1)) == UNSPEC \
&& (TARGET_SOFT_FLOAT \
- || TARGET_PA_20 \
+ || (TARGET_PA_20 && !TARGET_ELF32) \
|| ((MODE) != SFmode \
&& (MODE) != DFmode))) \
goto ADDR; \
@@ -1386,7 +1425,7 @@ do { \
rtx new, temp = NULL_RTX; \
\
mask = (GET_MODE_CLASS (MODE) == MODE_FLOAT \
- ? (TARGET_PA_20 ? 0x3fff : 0x1f) : 0x3fff); \
+ ? (TARGET_PA_20 && !TARGET_ELF32 ? 0x3fff : 0x1f) : 0x3fff); \
\
if (optimize \
&& GET_CODE (AD) == PLUS) \
Index: config/pa/pa.md
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.md,v
retrieving revision 1.113
diff -u -3 -p -r1.113 pa.md
--- config/pa/pa.md 11 Sep 2002 02:45:09 -0000 1.113
+++ config/pa/pa.md 30 Oct 2002 17:06:46 -0000
@@ -105,12 +105,9 @@
(define_delay (eq_attr "type" "call")
[(eq_attr "in_call_delay" "true") (nil) (nil)])

-;; millicode call delay slot description. Note it disallows delay slot
-;; when TARGET_PORTABLE_RUNTIME is true.
+;; Millicode call delay slot description.
(define_delay (eq_attr "type" "milli")
- [(and (eq_attr "in_call_delay" "true")
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME") (const_int 0)))
- (nil) (nil)])
+ [(eq_attr "in_call_delay" "true") (nil) (nil)])

;; Return and other similar instructions.
(define_delay (eq_attr "type" "branch,parallel_branch")
@@ -4089,27 +4086,7 @@
"!TARGET_64BIT"
"* return output_mul_insn (0, insn);"
[(set_attr "type" "milli")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 4)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 20)]
-
-;; Out of reach, can use ble
- (const_int 12)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_insn ""
[(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25)))
@@ -4120,7 +4097,7 @@
"TARGET_64BIT"
"* return output_mul_insn (0, insn);"
[(set_attr "type" "milli")
- (set (attr "length") (const_int 4))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_expand "muldi3"
[(set (match_operand:DI 0 "register_operand" "")
@@ -4211,27 +4188,7 @@
"*
return output_div_insn (operands, 0, insn);"
[(set_attr "type" "milli")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 4)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 20)]
-
-;; Out of reach, can use ble
- (const_int 12)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_insn ""
[(set (reg:SI 29)
@@ -4245,7 +4202,7 @@
"*
return output_div_insn (operands, 0, insn);"
[(set_attr "type" "milli")
- (set (attr "length") (const_int 4))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_expand "udivsi3"
[(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4261,6 +4218,7 @@
"
{
operands[3] = gen_reg_rtx (SImode);
+
if (TARGET_64BIT)
{
operands[5] = gen_rtx_REG (SImode, 2);
@@ -4287,27 +4245,7 @@
"*
return output_div_insn (operands, 1, insn);"
[(set_attr "type" "milli")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 4)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 20)]
-
-;; Out of reach, can use ble
- (const_int 12)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_insn ""
[(set (reg:SI 29)
@@ -4321,7 +4259,7 @@
"*
return output_div_insn (operands, 1, insn);"
[(set_attr "type" "milli")
- (set (attr "length") (const_int 4))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_expand "modsi3"
[(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4360,27 +4298,7 @@
"*
return output_mod_insn (0, insn);"
[(set_attr "type" "milli")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 4)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 20)]
-
-;; Out of reach, can use ble
- (const_int 12)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_insn ""
[(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25)))
@@ -4393,7 +4311,7 @@
"*
return output_mod_insn (0, insn);"
[(set_attr "type" "milli")
- (set (attr "length") (const_int 4))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_expand "umodsi3"
[(set (reg:SI 26) (match_operand:SI 1 "move_operand" ""))
@@ -4432,27 +4350,7 @@
"*
return output_mod_insn (1, insn);"
[(set_attr "type" "milli")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 4)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 24)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 20)]
-
-;; Out of reach, can use ble
- (const_int 12)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

(define_insn ""
[(set (reg:SI 29) (umod:SI (reg:SI 26) (reg:SI 25)))
@@ -4465,7 +4363,7 @@
"*
return output_mod_insn (1, insn);"
[(set_attr "type" "milli")
- (set (attr "length") (const_int 4))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 0)"))])

;;- and instructions
;; We define DImode `and` so with DImode `not` we can get
@@ -6036,11 +5934,12 @@
call_insn = emit_call_insn (gen_call_internal_reg (operands[1]));
}

+ if (TARGET_64BIT)
+ use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
+
if (flag_pic)
{
use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
- if (TARGET_64BIT)
- use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);

/* After each call we must restore the PIC register, even if it
doesn't appear to be used. */
@@ -6052,6 +5951,7 @@
(define_insn "call_internal_symref"
[(call (mem:SI (match_operand 0 "call_operand_address" ""))
(match_operand 1 "" "i"))
+ (clobber (reg:SI 1))
(clobber (reg:SI 2))
(use (const_int 0))]
"! TARGET_PORTABLE_RUNTIME"
@@ -6061,21 +5961,7 @@
return output_call (insn, operands[0], 0);
}"
[(set_attr "type" "call")
- (set (attr "length")
-;; If we're sure that we can either reach the target or that the
-;; linker can use a long-branch stub, then the length is at most
-;; 8 bytes.
-;;
-;; For long-calls the length will be at most 68 bytes (non-pic)
-;; or 84 bytes (pic). */
-;; Else we have to use a long-call;
- (if_then_else (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (const_int 8)
- (if_then_else (eq (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 68)
- (const_int 84))))])
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])

(define_insn "call_internal_reg_64bit"
[(call (mem:SI (match_operand:DI 0 "register_operand" "r"))
@@ -6086,15 +5972,16 @@
"*
{
/* ??? Needs more work. Length computation, split into multiple insns,
- do not use %r22 directly, expose delay slot. */
- return \"ldd 16(%0),%%r2\;ldd 24(%0),%%r27\;bve,l (%%r2),%%r2\;nop\";
+ expose delay slot. */
+ return \"ldd 16(%0),%%r2\;bve,l (%%r2),%%r2\;ldd 24(%0),%%r27\";
}"
[(set_attr "type" "dyncall")
- (set (attr "length") (const_int 16))])
+ (set (attr "length") (const_int 12))])

(define_insn "call_internal_reg"
[(call (mem:SI (reg:SI 22))
(match_operand 0 "" "i"))
+ (clobber (reg:SI 1))
(clobber (reg:SI 2))
(use (const_int 1))]
""
@@ -6218,11 +6105,13 @@
call_insn = emit_call_insn (gen_call_value_internal_reg (operands[0],
operands[2]));
}
+
+ if (TARGET_64BIT)
+ use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
+
if (flag_pic)
{
use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
- if (TARGET_64BIT)
- use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);

/* After each call we must restore the PIC register, even if it
doesn't appear to be used. */
@@ -6235,6 +6124,7 @@
[(set (match_operand 0 "" "=rf")
(call (mem:SI (match_operand 1 "call_operand_address" ""))
(match_operand 2 "" "i")))
+ (clobber (reg:SI 1))
(clobber (reg:SI 2))
(use (const_int 0))]
;;- Don't use operand 1 for most machines.
@@ -6245,21 +6135,7 @@
return output_call (insn, operands[1], 0);
}"
[(set_attr "type" "call")
- (set (attr "length")
-;; If we're sure that we can either reach the target or that the
-;; linker can use a long-branch stub, then the length is at most
-;; 8 bytes.
-;;
-;; For long-calls the length will be at most 68 bytes (non-pic)
-;; or 84 bytes (pic). */
-;; Else we have to use a long-call;
- (if_then_else (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (const_int 8)
- (if_then_else (eq (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 68)
- (const_int 84))))])
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 0)"))])

(define_insn "call_value_internal_reg_64bit"
[(set (match_operand 0 "" "=rf")
@@ -6271,16 +6147,17 @@
"*
{
/* ??? Needs more work. Length computation, split into multiple insns,
- do not use %r22 directly, expose delay slot. */
- return \"ldd 16(%1),%%r2\;ldd 24(%1),%%r27\;bve,l (%%r2),%%r2\;nop\";
+ expose delay slot. */
+ return \"ldd 16(%1),%%r2\;bve,l (%%r2),%%r2\;ldd 24(%1),%%r27\";
}"
[(set_attr "type" "dyncall")
- (set (attr "length") (const_int 16))])
+ (set (attr "length") (const_int 12))])

(define_insn "call_value_internal_reg"
[(set (match_operand 0 "" "=rf")
(call (mem:SI (reg:SI 22))
(match_operand 1 "" "i")))
+ (clobber (reg:SI 1))
(clobber (reg:SI 2))
(use (const_int 1))]
""
@@ -6389,10 +6266,9 @@
}")

(define_expand "sibcall"
- [(parallel [(call (match_operand:SI 0 "" "")
- (match_operand 1 "" ""))
- (clobber (reg:SI 0))])]
- "! TARGET_PORTABLE_RUNTIME"
+ [(call (match_operand:SI 0 "" "")
+ (match_operand 1 "" ""))]
+ "!TARGET_PORTABLE_RUNTIME"
"
{
rtx op;
@@ -6400,8 +6276,21 @@

op = XEXP (operands[0], 0);

- /* We do not allow indirect sibling calls. */
- call_insn = emit_call_insn (gen_sibcall_internal_symref (op, operands[1]));
+ if (TARGET_64BIT)
+ emit_move_insn (arg_pointer_rtx,
+ gen_rtx_PLUS (word_mode, virtual_outgoing_args_rtx,
+ GEN_INT (64)));
+
+ /* Indirect sibling calls are not allowed. */
+ if (TARGET_64BIT)
+ call_insn = gen_sibcall_internal_symref_64bit (op, operands[1]);
+ else
+ call_insn = gen_sibcall_internal_symref (op, operands[1]);
+
+ call_insn = emit_call_insn (call_insn);
+
+ if (TARGET_64BIT)
+ use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);

if (flag_pic)
{
@@ -6417,38 +6306,39 @@
(define_insn "sibcall_internal_symref"
[(call (mem:SI (match_operand 0 "call_operand_address" ""))
(match_operand 1 "" "i"))
- (clobber (reg:SI 0))
+ (clobber (reg:SI 1))
(use (reg:SI 2))
(use (const_int 0))]
- "! TARGET_PORTABLE_RUNTIME"
+ "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
"*
{
output_arg_descriptor (insn);
return output_call (insn, operands[0], 1);
}"
[(set_attr "type" "call")
- (set (attr "length")
-;; If we're sure that we can either reach the target or that the
-;; linker can use a long-branch stub, then the length is at most
-;; 8 bytes.
-;;
-;; For long-calls the length will be at most 68 bytes (non-pic)
-;; or 84 bytes (pic). */
-;; Else we have to use a long-call;
- (if_then_else (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (const_int 8)
- (if_then_else (eq (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 68)
- (const_int 84))))])
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 1)"))])
+
+(define_insn "sibcall_internal_symref_64bit"
+ [(call (mem:SI (match_operand 0 "call_operand_address" ""))
+ (match_operand 1 "" "i"))
+ (clobber (reg:SI 1))
+ (clobber (reg:SI 27))
+ (use (reg:SI 2))
+ (use (const_int 0))]
+ "TARGET_64BIT"
+ "*
+{
+ output_arg_descriptor (insn);
+ return output_call (insn, operands[0], 1);
+}"
+ [(set_attr "type" "call")
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 1)"))])

(define_expand "sibcall_value"
- [(parallel [(set (match_operand 0 "" "")
+ [(set (match_operand 0 "" "")
(call (match_operand:SI 1 "" "")
- (match_operand 2 "" "")))
- (clobber (reg:SI 0))])]
- "! TARGET_PORTABLE_RUNTIME"
+ (match_operand 2 "" "")))]
+ "!TARGET_PORTABLE_RUNTIME"
"
{
rtx op;
@@ -6456,10 +6346,24 @@

op = XEXP (operands[1], 0);

- /* We do not allow indirect sibling calls. */
- call_insn = emit_call_insn (gen_sibcall_value_internal_symref (operands[0],
- op,
- operands[2]));
+ if (TARGET_64BIT)
+ emit_move_insn (arg_pointer_rtx,
+ gen_rtx_PLUS (word_mode, virtual_outgoing_args_rtx,
+ GEN_INT (64)));
+
+ /* Indirect sibling calls are not allowed. */
+ if (TARGET_64BIT)
+ call_insn
+ = gen_sibcall_value_internal_symref_64bit (operands[0], op, operands[2]);
+ else
+ call_insn
+ = gen_sibcall_value_internal_symref (operands[0], op, operands[2]);
+
+ call_insn = emit_call_insn (call_insn);
+
+ if (TARGET_64BIT)
+ use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), arg_pointer_rtx);
+
if (flag_pic)
{
use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), pic_offset_table_rtx);
@@ -6475,32 +6379,34 @@
[(set (match_operand 0 "" "=rf")
(call (mem:SI (match_operand 1 "call_operand_address" ""))
(match_operand 2 "" "i")))
- (clobber (reg:SI 0))
+ (clobber (reg:SI 1))
(use (reg:SI 2))
(use (const_int 0))]
- ;;- Don't use operand 1 for most machines.
- "! TARGET_PORTABLE_RUNTIME"
+ "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT"
"*
{
output_arg_descriptor (insn);
return output_call (insn, operands[1], 1);
}"
[(set_attr "type" "call")
- (set (attr "length")
-;; If we're sure that we can either reach the target or that the
-;; linker can use a long-branch stub, then the length is at most
-;; 8 bytes.
-;;
-;; For long-calls the length will be at most 68 bytes (non-pic)
-;; or 84 bytes (pic). */
-;; Else we have to use a long-call;
- (if_then_else (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (const_int 8)
- (if_then_else (eq (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 68)
- (const_int 84))))])
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 1)"))])
+
+(define_insn "sibcall_value_internal_symref_64bit"
+ [(set (match_operand 0 "" "=rf")
+ (call (mem:SI (match_operand 1 "call_operand_address" ""))
+ (match_operand 2 "" "i")))
+ (clobber (reg:SI 1))
+ (clobber (reg:SI 27))
+ (use (reg:SI 2))
+ (use (const_int 0))]
+ "TARGET_64BIT"
+ "*
+{
+ output_arg_descriptor (insn);
+ return output_call (insn, operands[1], 1);
+}"
+ [(set_attr "type" "call")
+ (set (attr "length") (symbol_ref "attr_length_call (insn, 1)"))])

(define_insn "nop"
[(const_int 0)]
@@ -7392,6 +7298,12 @@
"!TARGET_64BIT"
"*
{
+ int length = get_attr_length (insn);
+ rtx xoperands[2];
+
+ xoperands[0] = GEN_INT (length - 8);
+ xoperands[1] = GEN_INT (length - 16);
+
/* Must import the magic millicode routine. */
output_asm_insn (\".IMPORT $$sh_func_adrs,MILLICODE\", NULL);

@@ -7400,60 +7312,24 @@
First, copy our input parameter into %r29 just in case we don't
need to call $$sh_func_adrs. */
output_asm_insn (\"copy %%r26,%%r29\", NULL);
+ output_asm_insn (\"{extru|extrw,u} %%r26,31,2,%%r31\", NULL);

/* Next, examine the low two bits in %r26, if they aren't 0x2, then
we use %r26 unchanged. */
- if (get_attr_length (insn) == 32)
- output_asm_insn (\"{extru|extrw,u} %%r26,31,2,%%r31\;{comib|cmpib},<>,n 2,%%r31,.+24\", NULL);
- else if (get_attr_length (insn) == 40)
- output_asm_insn (\"{extru|extrw,u} %%r26,31,2,%%r31\;{comib|cmpib},<>,n 2,%%r31,.+32\", NULL);
- else if (get_attr_length (insn) == 44)
- output_asm_insn (\"{extru|extrw,u} %%r26,31,2,%%r31\;{comib|cmpib},<>,n 2,%%r31,.+36\", NULL);
- else
- output_asm_insn (\"{extru|extrw,u} %%r26,31,2,%%r31\;{comib|cmpib},<>,n 2,%%r31,.+20\", NULL);
+ output_asm_insn (\"{comib|cmpib},<>,n 2,%%r31,.+%0\", xoperands);
+ output_asm_insn (\"ldi 4096,%%r31\", NULL);

/* Next, compare %r26 with 4096, if %r26 is less than or equal to
- 4096, then we use %r26 unchanged. */
- if (get_attr_length (insn) == 32)
- output_asm_insn (\"ldi 4096,%%r31\;{comb|cmpb},<<,n %%r26,%%r31,.+16\",
- NULL);
- else if (get_attr_length (insn) == 40)
- output_asm_insn (\"ldi 4096,%%r31\;{comb|cmpb},<<,n %%r26,%%r31,.+24\",
- NULL);
- else if (get_attr_length (insn) == 44)
- output_asm_insn (\"ldi 4096,%%r31\;{comb|cmpb},<<,n %%r26,%%r31,.+28\",
- NULL);
- else
- output_asm_insn (\"ldi 4096,%%r31\;{comb|cmpb},<<,n %%r26,%%r31,.+12\",
- NULL);
+ 4096, then again we use %r26 unchanged. */
+ output_asm_insn (\"{comb|cmpb},<<,n %%r26,%%r31,.+%1\", xoperands);

- /* Else call $$sh_func_adrs to extract the function's real add24. */
+ /* Finally, call $$sh_func_adrs to extract the function's real add24. */
return output_millicode_call (insn,
gen_rtx_SYMBOL_REF (SImode,
- \"$$sh_func_adrs\"));
+ \"$$sh_func_adrs\"));
}"
[(set_attr "type" "multi")
- (set (attr "length")
- (cond [
-;; Target (or stub) within reach
- (and (lt (plus (symbol_ref "total_code_bytes") (pc))
- (const_int 240000))
- (eq (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0)))
- (const_int 28)
-
-;; Out of reach PIC
- (ne (symbol_ref "flag_pic")
- (const_int 0))
- (const_int 44)
-
-;; Out of reach PORTABLE_RUNTIME
- (ne (symbol_ref "TARGET_PORTABLE_RUNTIME")
- (const_int 0))
- (const_int 40)]
-
-;; Out of reach, can use ble
- (const_int 32)))])
+ (set (attr "length") (symbol_ref "attr_length_millicode_call (insn, 20)"))])

;; On the PA, the PIC register is call clobbered, so it must
;; be saved & restored around calls by the caller. If the call
Index: config/pa/som.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/som.h,v
retrieving revision 1.38
diff -u -3 -p -r1.38 som.h
--- config/pa/som.h 29 Aug 2002 21:16:35 -0000 1.38
+++ config/pa/som.h 30 Oct 2002 17:06:46 -0000
@@ -371,3 +371,7 @@ do { \
on the location of the GCC tool directory. The downside is GCC
cannot be moved after installation using a symlink. */
#define ALWAYS_STRIP_DOTDOT 1
+
+/* Aggregates with a single float or double field should be passed and
+ returned in the general registers. */
+#define MEMBER_TYPE_FORCES_BLK(FIELD, MODE) (MODE==SFmode || MODE==DFmode)
Index: config/pa/t-pa64
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/t-pa64,v
retrieving revision 1.6
diff -u -3 -p -r1.6 t-pa64
--- config/pa/t-pa64 30 Apr 2002 19:47:38 -0000 1.6
+++ config/pa/t-pa64 30 Oct 2002 17:06:46 -0000
@@ -1,4 +1,4 @@
-TARGET_LIBGCC2_CFLAGS = -fPIC -Dpa64=1 -DELF=1
+TARGET_LIBGCC2_CFLAGS = -fPIC -Dpa64=1 -DELF=1 -mlong-calls

LIB2FUNCS_EXTRA=quadlib.c

Index: doc/invoke.texi
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.196
diff -u -3 -p -r1.196 invoke.texi
--- doc/invoke.texi 20 Oct 2002 19:18:30 -0000 1.196
+++ doc/invoke.texi 30 Oct 2002 17:06:48 -0000
@@ -508,7 +508,7 @@ in the following sections.
-march=@var{architecture-type} @gol
-mbig-switch -mdisable-fpregs -mdisable-indexing @gol
-mfast-indirect-calls -mgas -mgnu-ld -mhp-ld @gol
--mjump-in-delay -mlinker-opt @gol
+-mjump-in-delay -mlinker-opt -mlong-calls @gol
-mlong-load-store -mno-big-switch -mno-disable-fpregs @gol
-mno-disable-indexing -mno-fast-indirect-calls -mno-gas @gol
-mno-jump-in-delay -mno-long-load-store @gol
@@ -8093,6 +8093,33 @@ ld. The ld that is called is determined
configure option, gcc's program search path, and finally by the user's
@env{PATH}. The linker used by GCC can be printed using @samp{which
`gcc -print-prog-name=ld`}.
+
+@item -mlong-calls
+@opindex mno-long-calls
+Generate code that uses long call sequences. This ensures that a call
+is always able to reach linker generated stubs. The default is to generate
+long calls only when the distance from the call site to the beginning
+of the function or translation unit, as the case may be, exceeds a
+predefined limit set by the branch type being used. The limits for
+normal calls are 7,600,000 and 240,000 bytes, respectively for the
+PA 2.0 and PA 1.X architectures. Sibcalls are always limited at
+240,000 bytes.
+
+Distances are measured from the beginning of functions when using the
+@option{-ffunction-sections} option, or when using the @option{-mgas}
+and @option{-mno-portable-runtime} options together under HP-UX with
+the SOM linker.
+
+It is normally not desirable to use this option as it will degrade
+performance. However, it may be useful in large applications,
+particularly when partial linking is used to build the application.
+
+The types of long calls used depends on the capabilities of the
+assembler and linker, and the type of code being generated. The
+impact on systems that support long absolute calls, and long pic
+symbol-difference or pc-relative calls should be relatively small.
+However, an indirect call is used on 32-bit ELF systems in pic code
+and it is quite long.

@end table
John David Anglin
2002-11-09 23:18:41 UTC
Permalink
I noticed today a new fail on hppa64-hp-hpux11.11, vthunk3.C. It fails
scan-assembler _ZTvn4_n20_N1E1bEv. This doesn't seem like the correct
symbol name to test for on a 64-bit target. I see _ZTvn8_n40_N1E1bEv or
This might also be the reason why

FAIL: g++.dg/abi/vague1.C scan-assembler-not _ZN1AIiE1tE

fails.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-11-09 23:27:05 UTC
Permalink
Post by John David Anglin
This might also be the reason why
FAIL: g++.dg/abi/vague1.C scan-assembler-not _ZN1AIiE1tE
No, it's because there is a .IMPORT/.type directive for _ZN1AIiE1tE
in the assembler file. It needs to be XFAILed on HPPA HPUX.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Mark Mitchell
2002-11-11 02:19:18 UTC
Permalink
--On Saturday, November 09, 2002 06:18:41 PM -0500 John David Anglin
Post by John David Anglin
I noticed today a new fail on hppa64-hp-hpux11.11, vthunk3.C. It fails
scan-assembler _ZTvn4_n20_N1E1bEv. This doesn't seem like the correct
symbol name to test for on a 64-bit target. I see _ZTvn8_n40_N1E1bEv or
Yes; I should have had an x86-only flag in there. Fixed with the attached
patch, applied on the mainline.
Post by John David Anglin
This might also be the reason why
FAIL: g++.dg/abi/vague1.C scan-assembler-not _ZN1AIiE1tE
No; that looks like something different to me.
--
Mark Mitchell ***@codesourcery.com
CodeSourcery, LLC http://www.codesourcery.com

2002-11-10 Mark Mitchell <***@codesourcery.com>

* g++.dg/abi/vthunk3.C: Run only on x86.

Index: vthunk3.C
===================================================================
RCS file: /cvs/gcc/gcc/gcc/testsuite/g++.dg/abi/vthunk3.C,v
retrieving revision 1.1
retrieving revision 1.2
diff -c -5 -p -r1.1 -r1.2
*** vthunk3.C 8 Nov 2002 02:16:48 -0000 1.1
--- vthunk3.C 11 Nov 2002 02:20:37 -0000 1.2
***************
*** 1,6 ****
! // { dg-do compile }
// { dg-options "-fabi-version=0" }

struct A {
virtual void a ();
};
--- 1,6 ----
! // { dg-do compile { target i?86-*-* } }
// { dg-options "-fabi-version=0" }

struct A {
virtual void a ();
};
John David Anglin
2002-11-11 03:42:11 UTC
Permalink
Post by Mark Mitchell
Post by John David Anglin
This might also be the reason why
FAIL: g++.dg/abi/vague1.C scan-assembler-not _ZN1AIiE1tE
No; that looks like something different to me.
From examination of the assembler output, my understanding of why this
fails on hppa*-*-hpux* is that the symbol occurs in the assembler output
because we define ASM_OUTPUT_EXTERNAL to provide the correct type for
undefined external references. This happens even for symbols that
eventually turn out not to be needed.

The enclosed patch avoids the fail. It's not ideal but I doubt it's
worth the effort to try eliminate references in .import/.type directives.

Ok for main?

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-11-10 John David Anglin <***@hiauly1.hia.nrc.ca>

* g++.dg/abi/vague1.C (dg-final): Return if target is hppa*-*-hpux*.

Index: g++.dg/abi/vague1.C
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/testsuite/g++.dg/abi/vague1.C,v
retrieving revision 1.1
diff -u -3 -p -r1.1 vague1.C
--- g++.dg/abi/vague1.C 15 Mar 2002 09:54:42 -0000 1.1
+++ g++.dg/abi/vague1.C 11 Nov 2002 03:28:39 -0000
@@ -2,7 +2,9 @@
// instantiations.

// Disable debug info so we don't get confused by the symbol name there.
+// The test fails on hppa*-*-hpux* because the symbol _ZN1AIiE1tE is imported.
// { dg-options "-g0" }
+// { dg-final { if { [istarget hppa*-*-hpux*] } { return } } }
// { dg-final { scan-assembler-not "_ZN1AIiE1tE" } }

template <class T> struct A {
John David Anglin
2002-11-13 21:12:06 UTC
Permalink
ldd: "gengenrtl" is not a shared executable.
collect2: /usr/ccs/bin/ldd returned 1 exit status
I have installed the patch below and it should fix the above problem.
I forgot to say it was tested with both HP and GNU ld on hppa64-hp-hpux11.11.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-11-26 19:16:14 UTC
Permalink
See

<http://gcc.gnu.org/ml/gcc-patches/2002-11/msg00105.html>.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2002-11-26 23:41:14 UTC
Permalink
* expr.c (gen_group_rtx, emit_group_move): New functions.
* expr.h (gen_group_rtx, emit_group_move): Prototype.
* function.c (expand_function_start): Use gen_group_rtx to create a
PARALLEL rtx to hold the return value when the real return rtx is a
PARALLEL.
(expand_function_end): Use emit_group_move to move the return value
from a PARALLEL to the real return registers.
* rtl.h (REG_FUNCTION_VALUE_P): Allow function values to be returned
in PARALLELs.
Oops, I managed to lose the initialization of the first vector component
when the src is NULL. Doesn't affect hppa64-hp-hpux11 as it doesn't use
this feature.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-11-26 John David Anglin <***@hiauly1.hia.nrc.ca>

* expr.c (gen_group_rtx, emit_group_move): New functions.
* expr.h (gen_group_rtx, emit_group_move): Prototype.
* function.c (expand_function_start): Use gen_group_rtx to create a
PARALLEL rtx to hold the return value when the real return rtx is a
PARALLEL.
(expand_function_end): Use emit_group_move to move the return value
from a PARALLEL to the real return registers.
* rtl.h (REG_FUNCTION_VALUE_P): Allow function values to be returned
in PARALLELs.

Index: expr.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/expr.c,v
retrieving revision 1.493
diff -u -3 -p -r1.493 expr.c
--- expr.c 20 Nov 2002 21:52:58 -0000 1.493
+++ expr.c 26 Nov 2002 23:09:33 -0000
@@ -2203,6 +2203,42 @@ move_block_from_reg (regno, x, nregs, si
}
}

+/* Generate a PARALLEL rtx for a new non-consecutive group of registers from
+ ORIG, where ORIG is a non-consecutive group of registers represented by
+ a PARALLEL. The clone is identical to the original except in that the
+ original set of registers is replaced by a new set of pseudo registers.
+ The new set has the same modes as the original set. */
+
+rtx
+gen_group_rtx (orig)
+ rtx orig;
+{
+ int i, length;
+ rtx *tmps;
+
+ if (GET_CODE (orig) != PARALLEL)
+ abort ();
+
+ length = XVECLEN (orig, 0);
+ tmps = (rtx *) alloca (sizeof (rtx) * length);
+
+ /* Skip a NULL entry in first slot. */
+ i = XEXP (XVECEXP (orig, 0, 0), 0) ? 0 : 1;
+
+ if (i)
+ tmps[0] = 0;
+
+ for (; i < length; i++)
+ {
+ enum machine_mode mode = GET_MODE (XEXP (XVECEXP (orig, 0, i), 0));
+ rtx offset = XEXP (XVECEXP (orig, 0, i), 1);
+
+ tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, gen_reg_rtx (mode), offset);
+ }
+
+ return gen_rtx_PARALLEL (GET_MODE (orig), gen_rtvec_v (length, tmps));
+}
+
/* Emit code to move a block SRC to a block DST, where DST is non-consecutive
registers represented by a PARALLEL. SSIZE represents the total size of
block SRC in bytes, or -1 if not known. */
@@ -2322,6 +2358,26 @@ emit_group_load (dst, orig_src, ssize)
/* Copy the extracted pieces into the proper (probable) hard regs. */
for (i = start; i < XVECLEN (dst, 0); i++)
emit_move_insn (XEXP (XVECEXP (dst, 0, i), 0), tmps[i]);
+}
+
+/* Emit code to move a block SRC to block DST, where SRC and DST are
+ non-consecutive groups of registers, each represented by a PARALLEL. */
+
+void
+emit_group_move (dst, src)
+ rtx dst, src;
+{
+ int i;
+
+ if (GET_CODE (src) != PARALLEL
+ || GET_CODE (dst) != PARALLEL
+ || XVECLEN (src, 0) != XVECLEN (dst, 0))
+ abort ();
+
+ /* Skip first entry if NULL. */
+ for (i = XEXP (XVECEXP (src, 0, 0), 0) ? 0 : 1; i < XVECLEN (src, 0); i++)
+ emit_move_insn (XEXP (XVECEXP (dst, 0, i), 0),
+ XEXP (XVECEXP (src, 0, i), 0));
}

/* Emit code to move a block SRC to a block DST, where SRC is non-consecutive
Index: expr.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/expr.h,v
retrieving revision 1.123
diff -u -3 -p -r1.123 expr.h
--- expr.h 22 Sep 2002 14:09:31 -0000 1.123
+++ expr.h 26 Nov 2002 23:09:34 -0000
@@ -412,9 +412,16 @@ extern void move_block_to_reg PARAMS ((i
The number of registers to be filled is NREGS. */
extern void move_block_from_reg PARAMS ((int, rtx, int, int));

+/* Generate a non-consecutive group of registers represented by a PARALLEL. */
+extern rtx gen_group_rtx PARAMS ((rtx));
+
/* Load a BLKmode value into non-consecutive registers represented by a
PARALLEL. */
extern void emit_group_load PARAMS ((rtx, rtx, int));
+
+/* Move a non-consecutive group of registers represented by a PARALLEL into
+ a non-consecutive group of registers represented by a PARALLEL. */
+extern void emit_group_move PARAMS ((rtx, rtx));

/* Store a BLKmode value from non-consecutive registers represented by a
PARALLEL. */
Index: function.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/function.c,v
retrieving revision 1.387
diff -u -3 -p -r1.387 function.c
--- function.c 14 Oct 2002 21:19:04 -0000 1.387
+++ function.c 26 Nov 2002 23:09:36 -0000
@@ -6559,18 +6559,17 @@ expand_function_start (subr, parms_have_
subr, 1);

/* Structures that are returned in registers are not aggregate_value_p,
- so we may see a PARALLEL. Don't play pseudo games with this. */
- if (! REG_P (hard_reg))
- SET_DECL_RTL (DECL_RESULT (subr), hard_reg);
+ so we may see a PARALLEL or a REG. */
+ if (REG_P (hard_reg))
+ SET_DECL_RTL (DECL_RESULT (subr), gen_reg_rtx (GET_MODE (hard_reg)));
+ else if (GET_CODE (hard_reg) == PARALLEL)
+ SET_DECL_RTL (DECL_RESULT (subr), gen_group_rtx (hard_reg));
else
- {
- /* Create the pseudo. */
- SET_DECL_RTL (DECL_RESULT (subr), gen_reg_rtx (GET_MODE (hard_reg)));
+ abort ();

- /* Needed because we may need to move this to memory
- in case it's a named return value whose address is taken. */
- DECL_REGISTER (DECL_RESULT (subr)) = 1;
- }
+ /* Set DECL_REGISTER flag so that expand_function_end will copy the
+ result to the real return register(s). */
+ DECL_REGISTER (DECL_RESULT (subr)) = 1;
}

/* Initialize rtx for parameters and local variables.
@@ -6998,8 +6997,16 @@ expand_function_end (filename, line, end
convert_move (real_decl_rtl, decl_rtl, unsignedp);
}
else if (GET_CODE (real_decl_rtl) == PARALLEL)
- emit_group_load (real_decl_rtl, decl_rtl,
- int_size_in_bytes (TREE_TYPE (decl_result)));
+ {
+ /* If expand_function_start has created a PARALLEL for decl_rtl,
+ move the result to the real return registers. Otherwise, do
+ a group load from decl_rtl for a named return. */
+ if (GET_CODE (decl_rtl) == PARALLEL)
+ emit_group_move (real_decl_rtl, decl_rtl);
+ else
+ emit_group_load (real_decl_rtl, decl_rtl,
+ int_size_in_bytes (TREE_TYPE (decl_result)));
+ }
else
emit_move_insn (real_decl_rtl, decl_rtl);
}
Index: rtl.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/rtl.h,v
retrieving revision 1.374
diff -u -3 -p -r1.374 rtl.h
--- rtl.h 4 Nov 2002 16:57:03 -0000 1.374
+++ rtl.h 26 Nov 2002 23:09:36 -0000
@@ -185,7 +185,7 @@ struct rtx_def GTY((chain_next ("RTX_NEX
has used it as the function. */
unsigned int used : 1;
/* Nonzero if this rtx came from procedure integration.
- 1 in a REG means this reg refers to the return value
+ 1 in a REG or PARALLEL means this rtx refers to the return value
of the current function.
1 in a SYMBOL_REF if the symbol is weak. */
unsigned integrated : 1;
@@ -988,9 +988,10 @@ enum label_kind
#define REGNO(RTX) XCUINT (RTX, 0, REG)
#define ORIGINAL_REGNO(RTX) X0UINT (RTX, 1)

-/* 1 if RTX is a reg that is the current function's return value. */
+/* 1 if RTX is a reg or parallel that is the current function's return
+ value. */
#define REG_FUNCTION_VALUE_P(RTX) \
- (RTL_FLAG_CHECK1("REG_FUNCTION_VALUE_P", (RTX), REG)->integrated)
+ (RTL_FLAG_CHECK2("REG_FUNCTION_VALUE_P", (RTX), REG, PARALLEL)->integrated)

/* 1 if RTX is a reg that corresponds to a variable declared by the user. */
#define REG_USERVAR_P(RTX) \
Richard Henderson
2002-11-27 01:43:47 UTC
Permalink
* expr.c (gen_group_rtx, emit_group_move): New functions.
* expr.h (gen_group_rtx, emit_group_move): Prototype.
* function.c (expand_function_start): Use gen_group_rtx to create a
PARALLEL rtx to hold the return value when the real return rtx is a
PARALLEL.
(expand_function_end): Use emit_group_move to move the return value
from a PARALLEL to the real return registers.
* rtl.h (REG_FUNCTION_VALUE_P): Allow function values to be returned
in PARALLELs.
Ok.


r~
John David Anglin
2002-11-27 03:51:01 UTC
Permalink
../../gcc/gcc/config/pa/pa.c -o pa.o
../../gcc/gcc/config/pa/pa.c:206: error: `default_comp_type_attributes' undeclared here (not in a function)
This appears to a problem with the updating of ***@subversions.gnu.org.
Possibly, the problem will disappear at the next update. The changes to
target-def.h aren't there yet.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Krister Walfridsson
2002-11-28 20:06:25 UTC
Permalink
libf2c configuration patch
http://gcc.gnu.org/ml/gcc-patches/2002-10/msg01122.html

/Krister
Toon Moene
2002-11-28 21:59:53 UTC
Permalink
Post by Krister Walfridsson
libf2c configuration patch
http://gcc.gnu.org/ml/gcc-patches/2002-10/msg01122.html
OK for mainline. Sorry I missed this; it fell in my autumn vacation ...

Cheers,
--
Toon Moene - mailto:***@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
John David Anglin
2002-12-04 22:22:05 UTC
Permalink
From my reading of the documentation, I think you need both.
I wonder if GET_MODE_ALIGNMENT could be redefined?
I have tested the following on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.
There are no regressions and it reduces the alignment of long doubles in a
simple manner. However, it does require modifying machmode.h to allow
defining GET_MODE_ALIGNMENT in the target headers.

Is this ok for main?

Sorry Steve, I didn't mean to usurp your work.

Dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2002-12-04 John David Anglin <***@hiauly1.hia.nrc.ca>

* machmode.h (GET_MODE_ALIGNMENT): Don't define if already defined.
* pa.h (GET_MODE_ALIGNMENT): Define to obtain 64-bit alignment for
TFmode on non 64-bit ports.

Index: machmode.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/machmode.h,v
retrieving revision 1.31
diff -u -3 -p -r1.31 machmode.h
--- machmode.h 1 Nov 2002 09:35:24 -0000 1.31
+++ machmode.h 4 Dec 2002 19:13:33 -0000
@@ -159,7 +159,9 @@ extern enum machine_mode get_best_mode P

extern unsigned get_mode_alignment PARAMS ((enum machine_mode));

+#ifndef GET_MODE_ALIGNMENT
#define GET_MODE_ALIGNMENT(MODE) get_mode_alignment (MODE)
+#endif

/* For each class, get the narrowest mode in that class. */

Index: config/pa/pa.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa.h,v
retrieving revision 1.176
diff -u -3 -p -r1.176 pa.h
--- config/pa/pa.h 27 Nov 2002 02:34:15 -0000 1.176
+++ config/pa/pa.h 4 Dec 2002 19:13:36 -0000
@@ -478,6 +478,11 @@ do { \
to 128 bits to allow for lock semaphores in the stack frame.*/
#define BIGGEST_ALIGNMENT 128

+/* TFmode data only needs 64-bit alignment on non 64-bit ports. */
+#undef GET_MODE_ALIGNMENT
+#define GET_MODE_ALIGNMENT(MODE) \
+ (!TARGET_64BIT && (MODE) == TFmode ? 64 : get_mode_alignment (MODE))
+
/* Get around hp-ux assembler bug, and make strcpy of constants fast. */
#define CONSTANT_ALIGNMENT(CODE, TYPEALIGN) \
((TYPEALIGN) < 32 ? 32 : (TYPEALIGN))
Steve Ellcey
2002-12-04 22:31:52 UTC
Permalink
Post by John David Anglin
From my reading of the documentation, I think you need both.
I wonder if GET_MODE_ALIGNMENT could be redefined?
I have tested the following on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.
There are no regressions and it reduces the alignment of long doubles in a
simple manner. However, it does require modifying machmode.h to allow
defining GET_MODE_ALIGNMENT in the target headers.
Is this ok for main?
Sorry Steve, I didn't mean to usurp your work.
Dave
That's quite alright with me. This looks simpler then changing both
ADJUST_FIELD_ALIGN and DATA_ALIGNMENT (and possibly LOCAL_ALIGNMENT) to
get them all to match up.

Steve
John David Anglin
2002-12-05 02:14:03 UTC
Permalink
So if I understand correctly, you are suggesting that we drop
BIGGEST_ALIGNMENT back to 64 and increase STACK_BOUNDARY to 128.
I think then the STARTING_FRAME_OFFSET would have to increase
to 16. Otherwise, we never have a zero frame size. This will
add two more unused slots in the stack frame which I don't much
like.
this patch introduces a new macro STARTING_FRAME_PHASE (documented).
There is no need for a new macro. The frame pointer _must_ be
emit-rtl.c:4719: REGNO_POINTER_ALIGN (FRAME_POINTER_REGNUM) = STACK_BOUNDARY;
therefore it must always be the case that
sb = STACK_BOUNDARY / BITS_PER_UNIT;
off = STARTING_FRAME_OFFSET % sb;
STARTING_FRAME_PHASE == (off ? sb - off : 0)
In diagnosing why we never had a zero frame size on hppa64, I found
that when STARTING_FRAME_PHASE % sb is non zero you can never get a
frame size of zero. This seems like a bug as it wastes stack space
if you increase STARTING_FRAME_OFFSET so that off==0.

dave
--
J. David Anglin ***@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2003-01-17 17:15:24 UTC
Permalink
* cse.c (cse_insn): Avoid RTL sharing when updating the RETVAL
insn's notes following a substitution inside a libcall.
This doesn't fix the hppa64 problem. I will try to investigate further.
This is the insn that causes the ICE:

regclass.i.12.loop:
(insn 599 598 600 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (reg:DI 262) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))

regclass.i.13.bypass:
(insn 599 598 600 21 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))

regclass.i.19.life:
(insn 599 598 600 18 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(expr_list:REG_DEAD (reg/v:DI 157)
(nil)))

The ICE occurs because combine tries to simplify it and it can't because
of the constant substitution in the regclass.i.13.bypass pass. It would
seem to me that the subreg should be simplified when the substitution for
DI 262 is made. After that, the mode information is lost.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
l***@redhat.com
2003-01-17 17:26:51 UTC
Permalink
Post by John David Anglin
* cse.c (cse_insn): Avoid RTL sharing when updating the RETVAL
insn's notes following a substitution inside a libcall.
This doesn't fix the hppa64 problem. I will try to investigate further.
(insn 599 598 600 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (reg:DI 262) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))
(insn 599 598 600 21 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd
4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))
(insn 599 598 600 18 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd
4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(expr_list:REG_DEAD (reg/v:DI 157)
(nil)))
The ICE occurs because combine tries to simplify it and it can't because
of the constant substitution in the regclass.i.13.bypass pass. It would
seem to me that the subreg should be simplified when the substitution for
DI 262 is made. After that, the mode information is lost.
Agreed.

jeff
Jan Hubicka
2003-01-17 18:25:30 UTC
Permalink
Post by l***@redhat.com
Post by John David Anglin
* cse.c (cse_insn): Avoid RTL sharing when updating the RETVAL
insn's notes following a substitution inside a libcall.
This doesn't fix the hppa64 problem. I will try to investigate further.
(insn 599 598 600 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (reg:DI 262) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))
(insn 599 598 600 21 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd
4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(nil))
(insn 599 598 600 18 0000000000000000 (set (reg:DI 263)
(mult:DI (zero_extend:DI (subreg:SI (const_int -44 [0xffffffffffffffd
4]) 4))
(zero_extend:DI (subreg:SI (reg/v:DI 157) 4)))) -1 (nil)
(expr_list:REG_DEAD (reg/v:DI 157)
(nil)))
The ICE occurs because combine tries to simplify it and it can't because
of the constant substitution in the regclass.i.13.bypass pass. It would
seem to me that the subreg should be simplified when the substitution for
DI 262 is made. After that, the mode information is lost.
Agreed.
When replacement is made by validate_replace_reg, this is done
automatically. I think bypass pass should use it consistently...

Honza
Post by l***@redhat.com
jeff
Roger Sayle
2003-01-17 18:39:05 UTC
Permalink
Hi Jan and Jeff,
Post by Jan Hubicka
Post by l***@redhat.com
Post by John David Anglin
The ICE occurs because combine tries to simplify it and it can't because
of the constant substitution in the regclass.i.13.bypass pass. It would
seem to me that the subreg should be simplified when the substitution for
DI 262 is made. After that, the mode information is lost.
Agreed.
When replacement is made by validate_replace_reg, this is done
automatically. I think bypass pass should use it consistently...
No new RTL modification code was introduced splitting GCSE and
moving the jump bypassing pass after the loop optimizations.
Even prior to that, jump bypassing itself only affects branch
instructions and doesn't modify RTL expressions at all.

I believe that the problems on hppa64 and powerpc will turn out to
be more latent bugs in the existing GCSE or CSE constant propagation
code, just like the one in cse_insn that it has already uncovered
on H8300.


I know the rules, and am more than happy to take responsibility
for latent bugs that are exposed by my changes. If I'm right,
these fixes should also be applied to the 3.3 branch, as they'll
predate my recent changes.

Roger
--
David Edelsohn
2003-01-17 22:33:20 UTC
Permalink
Post by Roger Sayle
I believe that the problems on hppa64 and powerpc will turn out to
be more latent bugs in the existing GCSE or CSE constant propagation
code, just like the one in cse_insn that it has already uncovered
on H8300.
This does appear to be the case for the PowerPC segfault. After
further investigation, I can construct a simplified testcase which fails
with or without the jump bypass patch. The jump bypass patch simply
causes the latent bug to occur in the complete version of the function
that appears in the benchmark.

David
Dale Johannesen
2003-01-17 23:55:33 UTC
Permalink
Post by David Edelsohn
Post by Roger Sayle
I believe that the problems on hppa64 and powerpc will turn out to
be more latent bugs in the existing GCSE or CSE constant propagation
code, just like the one in cse_insn that it has already uncovered
on H8300.
This does appear to be the case for the PowerPC segfault. After
further investigation, I can construct a simplified testcase which fails
with or without the jump bypass patch. The jump bypass patch simply
causes the latent bug to occur in the complete version of the function
that appears in the benchmark.
Right. I've tracked this down to a lurker in the ppc-specific code.
Patch coming.
l***@redhat.com
2003-01-17 22:49:07 UTC
Permalink
Post by Roger Sayle
I believe that the problems on hppa64 and powerpc will turn out to
be more latent bugs in the existing GCSE or CSE constant propagation
code, just like the one in cse_insn that it has already uncovered
on H8300.
I agree completely.

It's worth noting that the code we're tripping in the PA backend is
notoriously flakey -- even more so for PA64. If it wasn't such a
huge win to use xmpyu I'd suggest we drop the damn thing.


Jeff
David Edelsohn
2003-01-17 17:50:33 UTC
Permalink
By the way, your original bypass patch also causes a PowerPC
benchmark to segfault. It looks like another code sharing problem, but I
still am trying to track it down.

David
David Edelsohn
2003-01-18 03:56:16 UTC
Permalink
BTW, now that Dale fixed the latent bug in the GCC MD file, one of
the benchmark testcases (not the one that segfaulted) shows a dramatic
performance improvement after your patch.

David
John David Anglin
2003-01-19 01:57:12 UTC
Permalink
I have determined by elimination that the ifcvt pass reordering
* toplev.c (dump_file_index): Add DFI_ce3.
(dump_file_info): Likewise.
(rest_of_compilation): Run first ifcvt pass before tracer.
results in the failure of g77.f-torture/execute/970625-2.f on the PA
(hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11* and hppa64-hp-hpux11*).
/home/dave/gcc-3.3/gcc/gcc/testsuite/g77.f-torture/execute/970625-2.f:74: internal compiler error: in convert_move, at expr.c:1303
We have
Breakpoint 1, convert_move (to=0x401ba240, from=0x401ba270, unsignedp=0)
at ../../gcc/gcc/expr.c:1303
1303 abort ();
(gdb) p debug_rtx (to)
(reg:CCFP 142)
$1 = void
(gdb) p debug_rtx (from)
(reg:SI 143)
Interesting.
This looks like latent PA bug uncovered, but I will try to check it next
week.
Here is a patch to fix the above. It occurs as a result of the installation
of placeholders to set the CCFP register. These were installed last
September to improve optimization of if statements on the PA. Under some
circumstances, the CCFP register can become the destination for
the store flag from a general operand. The enclosed patch restricts
setting the store flag to destinations with a mode class in MODE_INT.
Possibly, this is over restrictive but it seemed reasonable choice.

Tested on hppa64-hp-hpux11.11 with no regressions.

Ok for main and branch?

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2003-01-18 John David Anglin <***@nrc-cnrc.gc.ca>

* ifcvt.c (noce_emit_store_flag): Don't emit store flag if mode class
of destination is not MODE_INT.

Index: ifcvt.c
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/ifcvt.c,v
retrieving revision 1.105
diff -u -3 -p -r1.105 ifcvt.c
--- ifcvt.c 19 Sep 2002 01:07:10 -0000 1.105
+++ ifcvt.c 18 Jan 2003 21:09:28 -0000
@@ -645,8 +645,8 @@ noce_emit_store_flag (if_info, x, revers
end_sequence ();
}

- /* Don't even try if the comparison operands are weird. */
- if (cond_complex)
+ /* Don't even try if the comparison operands or the mode of X are weird. */
+ if (cond_complex || GET_MODE_CLASS (GET_MODE (x)) != MODE_INT)
return NULL_RTX;

return emit_store_flag (x, code, XEXP (cond, 0),
Richard Henderson
2003-01-20 17:31:33 UTC
Permalink
Post by John David Anglin
* ifcvt.c (noce_emit_store_flag): Don't emit store flag if mode class
of destination is not MODE_INT.
Use SCALAR_INT_MODE_P and this is ok.


r~
Ulrich Weigand
2003-01-24 03:44:43 UTC
Permalink
Secondly, if the condition hits, the code in question re-forms
the address from
(plus (plus frame-pointer index) constant)
to
(plus (plus frame-pointer constant) index)
which is non-canonical RTL. That would not be a problem if
-as intended- the (plus frame-pointer constant) term got
reloaded into a register. However, find_reloads is called
multiple times, and only after the last call will actual
reloads be made. Unfortunatly, this code segment will
*modify* the passed-in address RTX in-place on *every*
call of find_reloads. This means after the return from
the first find_reloads call out of calculate_needs_all_insns,
the main insn stream will contain this piece of invalid RTL.
While this is correct, it actually does not really matter.
*If* the condition when to enter this block were correct,
the constant term would be invalid for an address. This
can only happen if that constant was in fact introduced
by eliminate_regs_in_insn just before the call to find_reloads.

In that case, it does not matter that the address is
modified in place, as calculate_needs_all_insns will
throw away the insn body anway to get rid to the changes
done by eliminate_regs.

The reason why I am seeing the ICE on s390 is that this
block is entered -due to the incorrect condition- even
for addresses where the constant term is valid; such
addresses might have been in the insn stream even before
register elimination, and thus calculate_needs_all_insns
will not undo the modification ...

Thus, the only bug that needs to be fixed is in fact
the incorrect condition; the rest should be fine as is.

The following patch implements this, and also fixes a
stupid bug in the original patch: the lines
! && ! maybe_memory_address_p (mode, ad, &XEXP (XEXP (ad, 0), 0)))
and
! && ! maybe_memory_address_p (mode, ad, &XEXP (XEXP (ad, 0), 1)))
need to be swapped, of course.

Also, I've included a test case that shows the ICE on s390.

Bootstrapped/regtested on s390-ibm-linux and s390x-ibm-linux,
on gcc 3.3 branch and mainline.

OK to apply? Should this be considered for 3.2 as well?


ChangeLog:

gcc/
* reload.c (maybe_memory_address_p): New function.
(find_reloads_address): Use it instead of memory_address_p.

gcc/testsuite/
* gcc.dg/20030123-1.c: New test.


Index: gcc/reload.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reload.c,v
retrieving revision 1.178.2.4.2.4
diff -c -p -r1.178.2.4.2.4 reload.c
*** gcc/reload.c 24 Oct 2002 08:59:49 -0000 1.178.2.4.2.4
--- gcc/reload.c 23 Jan 2003 20:21:16 -0000
*************** static int alternative_allows_memconst P
*** 257,262 ****
--- 257,263 ----
static rtx find_reloads_toplev PARAMS ((rtx, int, enum reload_type, int,
int, rtx, int *));
static rtx make_memloc PARAMS ((rtx, int));
+ static int maybe_memory_address_p PARAMS ((enum machine_mode, rtx, rtx *));
static int find_reloads_address PARAMS ((enum machine_mode, rtx *, rtx, rtx *,
int, enum reload_type, int, rtx));
static rtx subst_reg_equivs PARAMS ((rtx, rtx));
*************** make_memloc (ad, regno)
*** 4545,4550 ****
--- 4546,4572 ----
return tem;
}

+ /* Returns true if AD could be turned into a valid memory reference
+ to mode MODE by reloading the part pointed to by PART into a
+ register. */
+
+ static int
+ maybe_memory_address_p (mode, ad, part)
+ enum machine_mode mode;
+ rtx ad;
+ rtx *part;
+ {
+ int retv;
+ rtx tem = *part;
+ rtx reg = gen_rtx_REG (GET_MODE (tem), max_reg_num ());
+
+ *part = reg;
+ retv = memory_address_p (mode, ad);
+ *part = tem;
+
+ return retv;
+ }
+
/* Record all reloads needed for handling memory address AD
which appears in *LOC in a memory reference to mode MODE
which itself is found in location *MEMREFLOC.
*************** find_reloads_address (mode, memrefloc, a
*** 4844,4850 ****
|| XEXP (XEXP (ad, 0), 0) == arg_pointer_rtx
#endif
|| XEXP (XEXP (ad, 0), 0) == stack_pointer_rtx)
! && ! memory_address_p (mode, ad))
{
*loc = ad = gen_rtx_PLUS (GET_MODE (ad),
plus_constant (XEXP (XEXP (ad, 0), 0),
--- 4866,4872 ----
|| XEXP (XEXP (ad, 0), 0) == arg_pointer_rtx
#endif
|| XEXP (XEXP (ad, 0), 0) == stack_pointer_rtx)
! && ! maybe_memory_address_p (mode, ad, &XEXP (XEXP (ad, 0), 1)))
{
*loc = ad = gen_rtx_PLUS (GET_MODE (ad),
plus_constant (XEXP (XEXP (ad, 0), 0),
*************** find_reloads_address (mode, memrefloc, a
*** 4869,4875 ****
|| XEXP (XEXP (ad, 0), 1) == arg_pointer_rtx
#endif
|| XEXP (XEXP (ad, 0), 1) == stack_pointer_rtx)
! && ! memory_address_p (mode, ad))
{
*loc = ad = gen_rtx_PLUS (GET_MODE (ad),
XEXP (XEXP (ad, 0), 0),
--- 4891,4897 ----
|| XEXP (XEXP (ad, 0), 1) == arg_pointer_rtx
#endif
|| XEXP (XEXP (ad, 0), 1) == stack_pointer_rtx)
! && ! maybe_memory_address_p (mode, ad, &XEXP (XEXP (ad, 0), 0)))
{
*loc = ad = gen_rtx_PLUS (GET_MODE (ad),
XEXP (XEXP (ad, 0), 0),
*** /dev/null Thu Sep 19 11:33:10 2002
--- gcc/testsuite/gcc.dg/20030123-1.c Thu Jan 23 21:27:33 2003
***************
*** 0 ****
--- 1,17 ----
+ /* This used to ICE due to a reload bug on s390*. */
+
+ /* { dg-do compile { target s390*-*-* } } */
+ /* { dg-options "-O2" } */
+
+ void func (char *p);
+
+ void test (void)
+ {
+ char *p = alloca (4096);
+ long idx;
+
+ asm ("" : "=r" (idx) : : "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12");
+
+ func (p + idx + 1);
+ }
+
--
Dr. Ulrich Weigand
***@informatik.uni-erlangen.de
John David Anglin
2003-02-01 21:40:13 UTC
Permalink
This patch is needed to fix a problem arising from the rewrite of
libiberty/pexecute.c. The rewrite changed pwait so that it now uses
waitpid instead of wait. Would someone please review:

<http://gcc.gnu.org/ml/gcc-patches/2003-01/msg02069.html>.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Zack Weinberg
2003-02-01 21:43:51 UTC
Permalink
Post by John David Anglin
This patch is needed to fix a problem arising from the rewrite of
libiberty/pexecute.c. The rewrite changed pwait so that it now uses
<http://gcc.gnu.org/ml/gcc-patches/2003-01/msg02069.html>.
It looks good to me, but I can't approve it.

zw
Geoff Keating
2003-02-02 00:22:13 UTC
Permalink
Post by John David Anglin
This patch is needed to fix a problem arising from the rewrite of
libiberty/pexecute.c. The rewrite changed pwait so that it now uses
<http://gcc.gnu.org/ml/gcc-patches/2003-01/msg02069.html>.
This is OK.
--
- Geoffrey Keating <***@geoffk.org>
John David Anglin
2003-02-03 05:02:44 UTC
Permalink
I now have tests running on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.00
and hppa-unknown-linux-gnu including the patch set for the other PR from
Franz Sirl. I will post the results as soon as available.
The hppa2.0w-hp-hpux11.11 are complete and identical to the previous
run without Franz's patch:

<http://gcc.gnu.org/ml/gcc-testresults/2003-02/msg00130.html>.

I had to restart the hppa64-hp-hpux11.00 run. I used the HP linker
and it generates a warning on each link. This totally messes
up the testsuite results. I would like to apply the following patch
from Steve Ellcey to correct the problem. It has been on the main and
3.3 since early last October. The comment describes what it does. It
only affects hppa64-hp-hpux11* and doesn't affect code generation.

As you are doing preleases, I will wait for your OK.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)

2003-02-02 Steve Ellcey <***@cup.hp.com>

* config/pa/pa64-hpux.h (INIT_ENVIRONMENT): New.

Index: config/pa/pa64-hpux.h
===================================================================
RCS file: /cvsroot/gcc/gcc/gcc/config/pa/pa64-hpux.h,v
retrieving revision 1.9
diff -u -3 -p -r1.9 pa64-hpux.h
--- config/pa/pa64-hpux.h 21 Jan 2002 21:22:19 -0000 1.9
+++ config/pa/pa64-hpux.h 3 Feb 2003 04:38:07 -0000
@@ -232,3 +232,8 @@ do { \
#ifndef ASM_DECLARE_RESULT
#define ASM_DECLARE_RESULT(FILE, RESULT)
#endif
+
+/* If using HP ld do not call pxdb. Use size as a program that does nothing
+ and returns 0. /bin/true cannot be used because it is a script without
+ an interpreter. */
+#define INIT_ENVIRONMENT "LD_PXDB=/usr/ccs/bin/size"
Gabriel Dos Reis
2003-02-03 11:03:02 UTC
Permalink
"John David Anglin" <***@hiauly1.hia.nrc.ca> writes:

[...]

| As you are doing preleases, I will wait for your OK.

It is OK.

Thanks,

-- Gaby
John David Anglin
2003-02-03 16:25:54 UTC
Permalink
Post by John David Anglin
I now have tests running on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.00
and hppa-unknown-linux-gnu including the patch set for the other PR from
Franz Sirl. I will post the results as soon as available.
The hppa2.0w-hp-hpux11.11 are complete and identical to the previous
All test runs are complete and posted. There are no regressions.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Gabriel Dos Reis
2003-02-03 16:54:20 UTC
Permalink
"John David Anglin" <***@hiauly1.hia.nrc.ca> writes:

| > > I now have tests running on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.00
| > > and hppa-unknown-linux-gnu including the patch set for the other PR from
| > > Franz Sirl. I will post the results as soon as available.
| >
| > The hppa2.0w-hp-hpux11.11 are complete and identical to the previous
| > run without Franz's patch:
|
| All test runs are complete and posted. There are no regressions.

Thanks for the report. Please commit the set of patches you tested.

This hopefully will be the last patch to commit.

-- Gaby
John David Anglin
2003-02-03 18:02:32 UTC
Permalink
Post by Gabriel Dos Reis
Thanks for the report. Please commit the set of patches you tested.
Done.

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
John David Anglin
2003-02-04 21:20:39 UTC
Permalink
I'm getting a bootstrap failure on HPUX with current CVS mainline,
probably caused by the recent "-Werror" change.
This may have been obvious but you can avoid the error until the problem
is fixed by configuring with "--disable-werror".

Dave
--
J. David Anglin ***@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
Loading...