Post by David FroblePost by Johnny BillquistPost by David FroblePost by Stephen HoffmanPost by John ReaganFrom the C99 standard (sorry for the poor cut n paste on my phone).
A string is a contiguous sequence of characters terminated by and
including the first null character.The termmultibyte stringis
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion
with a wide string.A pointer to a string is a pointer to its initial
(lowest addressed)character.The length of a string is the number of
bytes preceding the null character and the value of a string is the
sequence of the values of the contained characters, in order.
Ayup. C string handling sometimes makes me long for BASIC.
(Somebody please check on David. I think he may have fainted there.)
Nope. Take more than that.
But, it's not the language, it's the compiler using decent library
routines. I could argue that Basic doesn't do anything with strings.
At least from the perspective that it's mostly or all library calls.
Partly, and partly not. BASIC do not really have this as libraries,
even thought the implementation might sit in some library. It's a part
of the language itself, and you cannot really separate the things in
BASIC.
Z2$MAIN
1 1 Z1$ = "abc"
2 Z2$ = "123"
3 Z3$ = Z1$ + Z2$
4 End
Z2$MAIN
Generated code
0000: .PSECT $CODE
CFFC 0000: .WORD
^M<R2,R3,R4,R5,R6,R7,R8,R9,R10,R11,I>
52 FB AF 9E 0002: MOVAB .-3, R2
50 00000004 0G 9E 0006: MOVAB $PDATA+4, R0
51 50 D0 000D: MOVL R0, R1
00000000 GG 16 0010: JSB BAS$INIT_R8
FC AD FD AF 9E 0016: $L_1: MOVAB $L_1, -4(FP)
51 03 32 001B: CVTWL #3, R1
52 00000063 0G 9E 001E: MOVAB $PDATA+99, R2
50 5B AB 7E 0025: MOVAQ Z1$(R11), R0
00000000 GG 16 0029: JSB STR$COPY_R_R8
51 03 32 002F: CVTWL #3, R1
52 00000060 0G 9E 0032: MOVAB $PDATA+96, R2
50 63 AB 7E 0039: MOVAQ Z2$(R11), R0
00000000 GG 16 003D: JSB STR$COPY_R_R8
63 AB 7F 0043: PUSHAQ Z2$(R11)
5B AB 7F 0046: PUSHAQ Z1$(R11)
6B AB 7F 0049: PUSHAQ Z3$(R11)
00000000 GG 03 FB 004C: CALLS #3, STR$CONCAT
50 00000004 0G 9E 0053: MOVAB $PDATA+4, R0
00000000 GG 16 005A: JSB BAS$END_R8
50 01 D0 0060: MOVL #1, R0
04 0063: RET
0064: .END
Some snipping to attempt to make it fit and readable.
Note that the compiler does not produce any code to do any of the
operations. All it's doing is pushing arguments and invoking a library
routine. This is an example of what I've tried to say when I claim that
Basic doesn't really do much of the work.
I know. But my point is that this is an implementation detail of the
specific compiler that you are using. You cannot see that it is a
library routine that is called from your code, you do not in any way
control what library routine to call, and this could all change without
your code being aware of any of it.
This is because you are actually invoking standard language features in
your source code. Exactly how those are implemented is not the same
thing as some reference to some external library explicit in the code.
Post by David FrobleAnd a John Reagan question. Why is there a RET at the end of a main?
It's not a subroutine.
I cannot fully explain how the compiler looks at all of this, but it
looks like it certainly sets up R0 with an success exit status code
before the RET.
Also, Z2$MAIN starts with an entry mask, which makes me suspect that
there is a generic start code in the BASIC RTS, which then calls your
program through a generic call, and at the end you normally returns to
the RTS code for cleanup before actual program exit.
Post by David FroblePost by Johnny BillquistPost by David FrobleC just needs to be better at which library routines it uses ....
The problem (or one problem) with C is that the language don't really
have strings. You have pointers. And arrays... But actually, arrays
are pretty much just pointers as well. And strings are just arrays of
integers. And so, you have some convention on how to treat some arrays
as strings, in some special ways and cases. But the language still do
not have strings.
I'll agree, the C compiler doesn't know about strings. But what is a
string, or any other variable or literal? It's an address, and perhaps
some more data, such as length. Which is how strings can be used in C,
by setting up a structure which includes an address, a length, and
perhaps some other data. Then all that's required is a library routine
to work with the string.
The difference in Basic is that the compiler will set up the descriptor.
There are way more differences between them...
In BASIC, you can return a string from a function. It is a type. In C
you cannot, as that would require that you could return an array of
unknown size. What you can return in C is a pointer, which can be
pointing to an array. But that is then an object that you might end of
with many references to, and whose scope you need to be careful about,
and you might need to keep track of ownership and eventual memory freeing.
Also, since strings are an actual type in BASIC, you can copy between
then, modify them, and so on, without worrying about unintentional side
effects. In C, as strings don't really exist, and you actually just have
pointers, you need to pay much more attention to what you are doing. If
you want to modify a string, you need to make a copy of it, and then
modify that one. You can also not easily take substrings from a string,
without risk of corrupting the original string.
Finally, since STRINGs in BASIC have size information, you cannot go
outside and unintentionally clobber random memory. As strings are just
array pointers in C, and pointer arithmetic can lead you anywhere, you
can address all memory when you think you are playing with your string.
Something like:
x = "abc"[7];
in C is a nice illustration of the "problem". If you could even express
this in BASIC that way, it would be an error because of being out of
range. In C, that is perfectly legal, and will give you something. Who
knows what.
All of this then are sources of potential bugs in C code that people
lament so often. :-)
Post by David FroblePost by Johnny BillquistSo, obviously, dealing with strings is always going to be a ride.
(But I do like C, I just don't consider it to be the solution to all
the problems in the world.)
Most the discussions here, however, seem to focus too much on
languages, as if that is the problem and the solution. In the end, in
my experience, it's all about good programmers. Bad ones will create
problems no matter what language they write in. And unfortunately, bad
programmers outnumber good ones by a big margin, and it's only getting
worse as both academia and industry now thinks that the tools are the
solution to all problems.
Now there you got my 100% agreement.
Thanks.
I'd like to add that while having languages that helps getting rid of
some problems is useful, the fact is that no language or compiler can
figure out when a programmer writes semantic bugs. All they can catch is
syntactic bugs. So no tool in the world is ever going to be able to
catch the more serious and nefarious bugs. Bad programmers will continue
to be a headache, but academia and industry are now doing themselves a
disservice by making other people believe that there is a solution to
the problem of buggy programs, and they do not have to try and hire that
clever person who actually knows how to write code, but costs so much
money...
Oh well. Some companies do understand, which is enough for me...
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol