from elsewhere, an assembler

Discussion:

(too old to reply)

cr88192

2007-04-05 22:45:58 UTC

well, recently I had been posting some on c.l.a.x, and I have come here. as
I have recently heard, the group is human moderated (vs. machine moderated),
ok, good comments can be made for the moderator for actually going quickly
(seen many moderated groups where one may wait a week or more, and then
forget they ever posted until something shows up, or more often, it never
does...).

so, what I will mention is this:
for my own projects I have written an assembler (mostly since january).

now, what is any different than most other assemblers?
this one primarily targets in-memory compilation.
at present, I am not aware of other particularly similar projects.

if anyone feels like commenting, that would be cool.

General

basically, the task is taking output from various JIT compilers, also part
of my projects (my first JIT compiler was for my script lang, a newer one is
for a more recent C-compiler backend).

I use a (mostly) nasm-like syntax, albeit without any expression handling or
macro facilities at present, if ever (not needed for JIT, which is the
primary purpose).

otherwise, at present it can open the exe, and rip out the symbol table to
allow the host app to be more useful to dynamically assembled code
(windows-specific, though on linux this could probably be handled via libdl,
not yet tested on linux). I am considering possibly adding modular loading
of COFF files as well.

primarily it supports x86-32, but should in theory be able to handle 16 and
64 bits as well.

16-bit is likely to have problems, ie, I am unsure of the exact relation
between ModR/M format, cpu mode, and address overrides, additionally 'far'
forms of instructions are generally not implemented.

should support at least a good portion of the instruction set (including MMX
and SSE1/2), albeit I may have missed some and there are likely errors.

I have also added some features to help with PIC code (on x86, as x86-64 has
rip-relative mode, which is better...).

Syntax

as noted, the syntax is mostly like nasm, but some things are different.

sections are implemented by simply giving the name, such as '.text' or
'.data'.

multiple instructions may be grouped per line via ';', as in:
mov eax, [ebp+8]; push ebp
this is partly because often ';' looks nicer than '\n' in strings.

in some cases, I have renamed some forms of some instructions (originally, I
had renamed some forms of inc, and seperated out the 8-bit jmp/jcc forms,
but now this is handled automatically).

special features are different, for example,
getbase <reg>
setbase <symbol>
and $<name> are used for pic offsets (adjusted by the base and set to use
relative relocation).

as in:
getbase esi ;calc base and load into esi
mov eax, [esi+$str]

by default, the base is the start of the current assembly/module.
not that this is probably all that useful at present.

API (vague overview):

as noted, it assembles in-place. at present, it just assembles at the end of
some existing buffers, but may be later made to implement a kind of heap (so
that it is possible to allocate and free functions/modules, rather than just
gradually filling up the buffer).

things are grouped into a kind of conceptual unit (I am calling an
assembly). one uses a call to begin an assembly, and then generates any of
the assembler code via some number of printf like calls, and then ends the
assembly (at this point, the contents are assembled and a pointer to the
start of the assembly is returned).

JIT compilers typically do one function per assembly, but the asm loader
does a whole file per assembly (this will be the fundamental unit of
allocation/freeing, if/when implemented). another (but more complicated)
possibility would include symbolic garbage collection (likely via
ref-counting).

it is not online anywhere as of yet, but if anyone wants to look at the
source, I can probably email it to them (just ask via email or such, I can
also answer questions).

as noted, this is not intended as an end-user app/tool/lib, but if I am
lucky, maybe a potentially useful library (or as a starting point for other
interesting projects/libraries).

unlike nasm, IMO, the listing format is much cleaner (and is notably closer
to the form found in the intel/amd docs). if anything, maybe I have done
something by typing all this crap out (of course, nasm has some nifty pieces
of info I currently lack, like supported archs, ...).

a trivial example (calls printf, uses relative addressing):
.text

basm_main:
push ebp
mov ebp, esp

getbase ecx

lea eax, [ecx+$tststr]
push eax

call printf
pop ecx

pop ebp
ret

.data
tststr db "asm test string\n", 0

and a listing fragment (basic syntax):
add
04,ib al,i8
X80/0,ib rm8,i8
WX83/0,ib rm16,i8
TX83/0,ib rm32,i8
X83/0,ib rm64,i8
X02/r r8,rm8

where W/T/X/... tell where prefixes go (Word, DWord, REX).

any comments?...

will keep sig, this time, in case anyone cares:
--
BGB, 23 M S GU
cr88192 at hotmail dot com

SpooK

2007-04-05 23:15:23 UTC