Does this mean that an x86 O3 CPU will never squash an RMW instruction? I
am posting an instruction + protocol trace for obtained from O3 and Ruby.
In the first portion, you can see that the O3 CPU issues a locked RMW with
the read part having sn = 3051 and the write part having sn = 3052. In the
second portion, you can see that 3051 and 3052 are squashed and the in
the third portion of the trace, these are committed. There are several
things that I am not able to understand. Why is the RMW squashed, since
x86 architecture has to commit the instruction? Secondly, if RMW was being
executed speculatively, then what mechanism exists for informing the cache
controller about the instruction getting squashed? Thirdly, why was the
instruction committed later on, when it was originally squashed?
FullO3CPU: Ticking main, FullO3CPU.
21254500 1 Seq Begin > [0x840,
line 0x840] IFETCH
21254500: system.cpu1.iew.lsq.thread0: Executing load PC
(0x4002bd=>0x4002bf).(0=>1), [sn:3051]
21254500: system.cpu1.iew.lsq.thread0: Read called, load idx: 0, store
idx: -1, storeHead: 23 addr: 0x95b84
21254500: system.cpu1.iew.lsq.thread0: Doing memory access for inst
[sn:3051] PC (0x4002bd=>0x4002bf).(0=>1)
21254500 1 Seq Begin > [0x95b84,
line 0x95b80] Locked_RMW_Read
21254500: system.cpu1.iew.lsq.thread0: Executing store PC
(0x4002bd=>0x4002bf).(1=>2) [sn:3052]
21254500: system.cpu1.iew.lsq.thread0: Doing write to store idx 23, addr
0x95b84 data ^A | storeHead:23 [sn:3052]
:
:
:
21256500: system.cpu1.iew.lsq.thread0: Squashing until [sn:2993]!(Loads:2
Stores:2)
21256500: system.cpu1.iew.lsq.thread0: Load Instruction PC
(0x4002c7=>0x4002c8).(0=>1) squashed, [sn:3060]
21256500: system.cpu1.iew.lsq.thread0: Load Instruction PC
(0x4002bd=>0x4002bf).(0=>1) squashed, [sn:3051]
21256500: system.cpu1.iew.lsq.thread0: Store Instruction PC
(0x4002c4=>0x4002c7).(0=>1) squashed, idx:24 [sn:3059]
21256500: system.cpu1.iew.lsq.thread0: Store Instruction PC
(0x4002bd=>0x4002bf).(1=>2) squashed, idx:23 [sn:3052]
:
:
:
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(0=>1) [sn:3013]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(1=>2) [sn:3014]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(2=>3) [sn:3015]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(0=>1) [sn:3032]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(1=>2) [sn:3033]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(2=>3) [sn:3034]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(0=>1) [sn:3051]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(1=>2) [sn:3052]
21258000: system.cpu1: Removing committed instruction [tid:0] PC
(0x4002bd=>0x4002bf).(2=>3) [sn:3053]
Thanks
Nilay
Post by Steve ReinhardtHi Nilay,
No, x86 locked RMW accesses are different from Alpha/MIPS LL/SC, and they
are not allowed to be interrupted (once the cache begins the sequence).
In the old days, all the gem5 cpu models implemented was LL/SC, and the
LOCKED flag meant LLSC. A while ago we renamed the old LOCKED flag to LLSC
and added a new LOCKED flag that means x86 atomic RMW.
The situation you're observing is that the classic memory system only
implements LLSC and not LOCKED. In contrast, I believe Ruby implements both
LOCKED and LLSC.
if (req->isLocked())
warn_once("Classic cache does not implement locked accesses. MP
execution could be wrong!\n")
to the classic cache code so people know that this is the case.
Steve
Post by Nilay VaishHi
I am trying to make the O3 CPU work with Ruby, but I am running in to
problem with implementation of Locked RMW in Ruby. Currently, when the read
part of RMW is issued, Ruby puts the block on a special list. The block is
taken of that list when the write part of RMW is issued. If any other
processor issues a read / write request for that block in between the RMW's
read and write operations, the request is delayed till the block is
unlocked. This means that the RMW can never fail and the write request needs
to issued always.
Reading the code from the classic memory system, it seems that it allows
for the block to be given in case some other processor requests for it. This
means that classic memory system allows RMW to fail.
My question is which of these behavior is actually implemented in x86? As I
understand LL/SC is allowed to fail in MIPS or Alpha architecture. I would
assume that same holds true for x86 as well. Is that the case or not?
Thanks
Nilay
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev