Discussion:
_swap as an operator in the C language?
(too old to reply)
Thiago Adams
2016-12-17 12:39:00 UTC
Permalink
I think if _swap where an operator in C we would have many advantages.

* Optimized by the compiler considering the types
* Avoid of hand made macros or code that is always the same

Problems?
In some cases the swap is something customized. In this case the build-swap would not be used.

Language level of abstraction
* I think the same principles that applies for struct copy (operator =) applies for _swap.

I don't know if intrinsic functions could do the job because of the signature of types?

What do you think?
BartC
2016-12-17 14:19:17 UTC
Permalink
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
* Optimized by the compiler considering the types
* Avoid of hand made macros or code that is always the same
Problems?
In some cases the swap is something customized. In this case the build-swap would not be used.
Language level of abstraction
* I think the same principles that applies for struct copy (operator =) applies for _swap.
I don't know if intrinsic functions could do the job because of the signature of types?
What do you think?
I think it's going to be a hard sell (especially to people who have
already rejected binary literals and separators within numbers).

swap() would most likely be of use, IMO, in exchanging primitive types,
rather than structs or arrays. You could do the latter, but it's
inefficient so is usually avoided. And then it can be done using a
function along the lines of memcpy where the data is an anonymous block
of bytes, not a type.

And swapping primitive types can now be done by a combination of
_Generic, and inline functions. Or you just use custom code each time:
{T t=x; x=y; y=t;}.

Perhaps 'typeof' would be a better bet for a new feature (but then I can
see _Generic being of use there too. Between them macros and generics
have probably put paid to most new features!).
--
Bartc
s***@casperkitty.com
2016-12-17 17:02:52 UTC
Permalink
Post by BartC
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
* Optimized by the compiler considering the types
* Avoid of hand made macros or code that is always the same
Problems?
In some cases the swap is something customized. In this case the build-swap would not be used.
Language level of abstraction
* I think the same principles that applies for struct copy (operator =) applies for _swap.
I don't know if intrinsic functions could do the job because of the signature of types?
What do you think?
I think it's going to be a hard sell (especially to people who have
already rejected binary literals and separators within numbers).
Of course, that doesn't mean that the language shouldn't include such an
operator--it could just be that the Standard is being maintained by people
whose ideas of usefulness don't necessarily coincide with the views of
those using the language.
Post by BartC
swap() would most likely be of use, IMO, in exchanging primitive types,
rather than structs or arrays. You could do the latter, but it's
inefficient so is usually avoided. And then it can be done using a
function along the lines of memcpy where the data is an anonymous block
of bytes, not a type.
I would suggest that the possibility of implementing something like swap
using memcpy should be an argument *in favor* of adding it to the language.
If something could be implemented using existing means using portable
macros, then the burden imposed on compiler writers by adding it to the
Standard would be essentially zero *in cases where the macros would work
adequately*--all they'd have to do would be to throw into a .h file some
macros supplied by the Committee. On the other hand, it would be easier
to have a compiler include an intrinsic to efficiently swap two objects
than to have it recognize all the ways in which code might try to swap
two objects and arrange the operations for optimal performance.

One of the design goals of C was to make it possible for even a simple
compiler to generate efficient code. The idea that compilers should try
to use high-level analysis to figure out that code is simply trying to
swap two items, so that it can then generate efficient item-swap code,
seems both unhelpful and fundamentally contrary to the purpose of making
C an easy-to-process language.
Thiago Adams
2016-12-18 09:59:06 UTC
Permalink
Post by BartC
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
* Optimized by the compiler considering the types
* Avoid of hand made macros or code that is always the same
Problems?
In some cases the swap is something customized. In this case the build-swap would not be used.
Language level of abstraction
* I think the same principles that applies for struct copy (operator =) applies for _swap.
I don't know if intrinsic functions could do the job because of the signature of types?
What do you think?
I think it's going to be a hard sell (especially to people who have
already rejected binary literals and separators within numbers).
swap() would most likely be of use, IMO, in exchanging primitive types,
rather than structs or arrays. You could do the latter, but it's
inefficient so is usually avoided. And then it can be done using a
function along the lines of memcpy where the data is an anonymous block
of bytes, not a type.
The assignment for arrays doesn't work today.

int a[5];
int b[5];
a = b; // error: assignment to expression with array type

so, it would not be so bad if swap was not applicable for arrays.
I don't remember in my code to swap array's content.
If this assignment was allowed someday then swap also could be.
Post by BartC
And swapping primitive types can now be done by a combination of
_Generic, and inline functions.
Do you mean _Generic with all primitive types?
I have a lot of uses for pointers. So, to avoid to add each type on _Generic I would have to cast to void*

_swap( (void*)pA, (void*) pB);

Most of the use of swap in my code is for pointers and for structs that have pointers inside where the struct is responsible for the pointer content.


If the programmer is not happy with build in swap he could just ignore.
I think the expect implementation could be at least as good/bad the copy to temporary and then exchange contents.
BartC
2016-12-18 13:13:22 UTC
Permalink
Post by Thiago Adams
Post by BartC
swap() would most likely be of use, IMO, in exchanging primitive types,
rather than structs or arrays. You could do the latter, but it's
inefficient so is usually avoided. And then it can be done using a
function along the lines of memcpy where the data is an anonymous block
of bytes, not a type.
The assignment for arrays doesn't work today.
int a[5];
int b[5];
a = b; // error: assignment to expression with array type
so, it would not be so bad if swap was not applicable for arrays.
I don't remember in my code to swap array's content.
If you have a small type, say (x,y,z) defining a point, or (r,g,b,a) for
a colour or pixel, then either could be defined as a struct or a short
array. But because of C's limitations with value arrays (you can't even
pass or return a 4-byte array) then you would probably avoid using them.
Maybe that's why swapping arrays doesn't come up often!
Post by Thiago Adams
If this assignment was allowed someday then swap also could be.
It's allowed as memcpy(a,b,sizeof a). And swapping can be down with a
third, temporary array and three memcpys. (Or, if swapping of primitives
was possible, with a loop swapping corresponding elements of a and b.)
Post by Thiago Adams
Post by BartC
And swapping primitive types can now be done by a combination of
_Generic, and inline functions.
Do you mean _Generic with all primitive types?
I have a lot of uses for pointers. So, to avoid to add each type on _Generic I would have to cast to void*
_swap( (void*)pA, (void*) pB);
Actually, for exchanging data, primitives types can also be considered
just blocks of bytes. So the swap operator reduces to exchanging blocks
of 1, 2, 4 or 8 bytes; or, for structs and arrays, N bytes. (For arrays,
N must be obtainable by applying sizeof to the array.)

Other than size, I don't think type needs to come into it, unless there
is some technical reason why exchanging two doubles for example needs to
be done via floating point registers. Or for moving a struct to be done
field by field, so that any padding bytes are not moved.
Post by Thiago Adams
Most of the use of swap in my code is for pointers and for structs that have pointers inside where the struct is responsible for the pointer content.
There will be some pointer-related issues but they are obvious ones.
Suppose there are structs A and B, and there is a pointer to an element
A.m. Then after exchange, the pointer will no longer point to what it
thought it was pointing to.

But this occurs in all sorts of situations: realloc()-ing an array for
example that had had pointers to elements. More serious now as the
original pointer target might no longer be in valid memory.
Post by Thiago Adams
If the programmer is not happy with build in swap he could just ignore.
I think the expect implementation could be at least as good/bad the copy to temporary and then exchange contents.
Do you have actual examples of bad swap code being generated (from
source like t=x; x=y; y=t) and how it could be better?
--
Bartc
s***@casperkitty.com
2016-12-18 18:37:23 UTC
Permalink
Post by BartC
Do you have actual examples of bad swap code being generated (from
source like t=x; x=y; y=t) and how it could be better?
On many implementations, given two arrays of "int", the optimal way
to perform a swap will entail loading groups of two or four ints
from each array, storing each group to the other array, and then
repeating the process. On many implementations, if source code
swaps elements one byte at a time, the generated machine code will do
likewise, and if source code fetches elements in groups which are
too large to fit in registers the generated machine code will spill
them out to temporary locations, negating the benefits of consolidated
accesses. An intrinsic to swap things (preferably with semantics that
would define behavior in case source and destination pointers match
precisely, but without needing to define other overlap cases) could
generate code that uses the best size of chunk for the target, without
the programmer having to know what that should be.
Keith Thompson
2016-12-18 20:35:19 UTC
Permalink
[...]
Post by Thiago Adams
Post by BartC
And swapping primitive types can now be done by a combination of
_Generic, and inline functions.
Do you mean _Generic with all primitive types?
I have a lot of uses for pointers. So, to avoid to add each type on
_Generic I would have to cast to void*
_swap( (void*)pA, (void*) pB);
A cast expression is not an lvalue; `(void*)pA` just yields the
converted *value* of pA, with no reference to the object.

Even if you managed to define the semantics so the pointers are
converted to void* and back, it wouldn't cover function pointers,
and the conversion would be overkill on implementations where the
conversion is non-trivial.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2016-12-18 20:39:30 UTC
Permalink
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.

A memswap() function would be one reasonable approach.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-12-18 23:00:08 UTC
Permalink
Post by Keith Thompson
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
Adding intrinsic functions that would need to consider argument types for
semantic correctness would be a major change. On the other hand, I would
think there could be some usefulness to having a family of mem* functions
which use the types of the arguments to make aliasing inferences (and then
getting rid of memcpy's insanely broken aliasing rules which allow far
fewer useful optimization opportunities while needlessly complicating what
should be simple operations).

Specifying that if the arguments to memswap() are both of type void*, a
compiler must recognize aliasing of any type, but if either is a pointer
to any specific kind of object a compiler may assume that it will not
alias objects of other types, would allow programmers to achieve the full
level of semantic power that would be available without any aliasing
restrictions (they could cast pointers to void* when aliasing was required)
but allow more useful optimizations than are presently possible under rules
that would try to make it carry through the effective types of objects. If
a compiler needs to handle memswap() [or memcpy, or memmove] as a straight
function call, it could do so and ignore the types of the arguments, provided
that it presumed that the function might access objects of any type. On
the other hand, using the argument type as a presumption for what could be
aliased would make a lot more sense than applying silly effective-type rules
like C99 uses for memcpy.
Thiago Adams
2016-12-19 16:31:18 UTC
Permalink
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
Using the same logic, can I say that the operator = is not reasonable for structs and the committe should not have approved it and instead sujest the use of memcpy ?
Keith Thompson
2016-12-19 20:46:04 UTC
Permalink
Post by Thiago Adams
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
Using the same logic, can I say that the operator = is not reasonable
for structs and the committe should not have approved it and instead
sujest the use of memcpy ?
Not really. I was commenting on the idea of intrinsic functions.
"=" is an operator, not a function, intrinsic or otherwise.
It already existed for scalar types; changing it to cover structs
and unions was a straightforward change.

If you want a swap *operator*, you could pick a symbol for it
(perhaps "<=>") and define its precedence and semantics (both
operands must be lvalues of the same or compatible types -- I haven't
thought through all the details). It could even apply to arrays if
the operands of "<=>" became a 4th exception to the array-to-pointer
conversion rule. I wouldn't object to such a change, but I doubt
that the committee would consider it to be sufficiently useful to
add to the language, given that swapping is only slightly awkward
with existing features.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Thiago Adams
2016-12-19 21:45:36 UTC
Permalink
Post by Keith Thompson
Post by Thiago Adams
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
Using the same logic, can I say that the operator = is not reasonable
for structs and the committe should not have approved it and instead
sujest the use of memcpy ?
Not really. I was commenting on the idea of intrinsic functions.
"=" is an operator, not a function, intrinsic or otherwise.
It already existed for scalar types; changing it to cover structs
and unions was a straightforward change.
If you want a swap *operator*, you could pick a symbol for it
(perhaps "<=>") and define its precedence and semantics (both
operands must be lvalues of the same or compatible types -- I haven't
thought through all the details). It could even apply to arrays if
the operands of "<=>" became a 4th exception to the array-to-pointer
conversion rule. I wouldn't object to such a change, but I doubt
that the committee would consider it to be sufficiently useful to
add to the language, given that swapping is only slightly awkward
with existing features.
I think the best approach is to have it similar to functions. It can be defined at expression level just like sizeof or cast.
Function like style allows a macro in case you have to compile for previous compilers.

Of course, if it was an operator like <> then someone could create a macro
_SWAP(a, b) (a) <> (b).
Keith Thompson
2016-12-19 22:22:03 UTC
Permalink
[...]
Post by Thiago Adams
Post by Keith Thompson
If you want a swap *operator*, you could pick a symbol for it
(perhaps "<=>") and define its precedence and semantics (both
operands must be lvalues of the same or compatible types -- I haven't
thought through all the details). It could even apply to arrays if
the operands of "<=>" became a 4th exception to the array-to-pointer
conversion rule. I wouldn't object to such a change, but I doubt
that the committee would consider it to be sufficiently useful to
add to the language, given that swapping is only slightly awkward
with existing features.
I think the best approach is to have it similar to functions. It can
be defined at expression level just like sizeof or cast. Function
like style allows a macro in case you have to compile for previous
compilers.
The sizeof and cast operators are not functions, nor are they similar to
functions. A sizeof expression can have a superficial resemblance to a
function call, but only because the operator symbol happens to be a
keyword rather than a punctuator. Keywords play a very different
syntactic role from non-keyword identifiers.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Thiago Adams
2016-12-20 10:55:03 UTC
Permalink
Post by Keith Thompson
[...]
Post by Thiago Adams
Post by Keith Thompson
If you want a swap *operator*, you could pick a symbol for it
(perhaps "<=>") and define its precedence and semantics (both
operands must be lvalues of the same or compatible types -- I haven't
thought through all the details). It could even apply to arrays if
the operands of "<=>" became a 4th exception to the array-to-pointer
conversion rule. I wouldn't object to such a change, but I doubt
that the committee would consider it to be sufficiently useful to
add to the language, given that swapping is only slightly awkward
with existing features.
I think the best approach is to have it similar to functions. It can
be defined at expression level just like sizeof or cast. Function
like style allows a macro in case you have to compile for previous
compilers.
The sizeof and cast operators are not functions, nor are they similar to
functions. A sizeof expression can have a superficial resemblance to a
function call, but only because the operator symbol happens to be a
keyword rather than a punctuator. Keywords play a very different
syntactic role from non-keyword identifiers.
My first thought was to add in the grammar at the same position of sizeof/cast.

But, now, I would add it at:


1) If swap was operator <>

assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression

assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
<> swap operator <-here

2) if swap was a keyword

assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
swap-expression <-here

swap-expression
_swap ( unary-expression, unary-expression)
Kaz Kylheku
2016-12-20 12:41:46 UTC
Permalink
Post by Thiago Adams
swap-expression
_swap ( unary-expression, unary-expression)
It's pretty stupid to add this kind of thing as a primitive ("special
form", in Lisp terms) rather than providing a way for it to be written
in the language.

Remember that one of the selling points of C over Pascal and its ilk is
that the library, including I/O and formatted printing, could all be
written in C. Not like that silly "writeln" cruft that has to be wired
into the parser, and has its own syntax that cannot be imitated by
a programmer-written wrapper.

There is already a descendant dialect of C in which you can write swap,
namely C++.

You don't have to, because std::swap is provided.

If developing C is beating a dead horse, adding a swap operator is like
grafting a horn onto a dead horse's forehead to make a unicorn.
Thiago Adams
2016-12-20 13:31:07 UTC
Permalink
Post by Kaz Kylheku
Post by Thiago Adams
swap-expression
_swap ( unary-expression, unary-expression)
It's pretty stupid to add this kind of thing as a primitive ("special
form", in Lisp terms) rather than providing a way for it to be written
in the language.
Remember that one of the selling points of C over Pascal and its ilk is
that the library, including I/O and formatted printing, could all be
written in C. Not like that silly "writeln" cruft that has to be wired
into the parser, and has its own syntax that cannot be imitated by
a programmer-written wrapper.
There is already a descendant dialect of C in which you can write swap,
namely C++.
You don't have to, because std::swap is provided.
If developing C is beating a dead horse, adding a swap operator is like
grafting a horn onto a dead horse's forehead to make a unicorn.
To use swap in C++ you need to include <algorithm>. Types that provide swap needs to add a member function swap and a non-member function swap, so the global swap can be used using name lookup rules.
If we consider swap something fundamental and basic then it also could be added to C++ as operator.
For C++, users could override this operator in a class basis, like new. C++ also have a concept of move build-in that is similar of swap.The question if something must be "build-in" or not is interesting because everything have drawbacks. Particularly in C++, you can receive big error messages from templates because the concept is not on the compiler's mind. They are trying to improve this given constrains to templates that could generate better error messages.

Sample of error message:

vector<int> v1;
vector<double> v2;

swap(v1, v2);

error C2664: 'void std::swap(std::exception_ptr &,std::exception_ptr &) throw()': cannot convert argument 1 from 'std::vector<int,std::allocator<_Ty>>' to 'std::exception_ptr &'
with
[
_Ty=int
]

if swap was build-in the message would be:
Swap operator used in different types are not allowed.

In C, if we had a satisfactory way to do it, then of course, I wouldn't have suggest it. Do we have?

We can think in others ways like templates for C, for better macros, or typeof + gcc ({ })


I think a new language could exist to extend C in a way that both are created together. The C language would be focused in generate code. To create something new, we should find a good reason to generate better code. (swap could be in that scenario- would it generate better code?) Why const was added? It helps code generation? Why operator = was added for structs? We can write it.

And the second language let's call "C with extensions" could have suggestions to improve the general usability and security. Something for humans to use, not machines.
I consider that C already have features like that. For instance, I consider enuns something for humans because it's no different than constant macros for the generated code. It's a feature to make the language more usable. Another sample is the declaration of variables inside for. Was the generated code improved?
Kaz Kylheku
2016-12-20 13:42:37 UTC
Permalink
Post by Thiago Adams
Post by Kaz Kylheku
Post by Thiago Adams
swap-expression
_swap ( unary-expression, unary-expression)
It's pretty stupid to add this kind of thing as a primitive ("special
form", in Lisp terms) rather than providing a way for it to be written
in the language.
Remember that one of the selling points of C over Pascal and its ilk is
that the library, including I/O and formatted printing, could all be
written in C. Not like that silly "writeln" cruft that has to be wired
into the parser, and has its own syntax that cannot be imitated by
a programmer-written wrapper.
There is already a descendant dialect of C in which you can write swap,
namely C++.
You don't have to, because std::swap is provided.
If developing C is beating a dead horse, adding a swap operator is like
grafting a horn onto a dead horse's forehead to make a unicorn.
To use swap in C++ you need to include <algorithm>. Types that provide
swap needs to add a member function swap and a non-member function
swap, so the global swap can be used using name lookup rules. If we
consider swap something fundamental and basic then it also could be
added to C++ as operator.
A less sophisticted swap can be obtained as a simple template
function, which will be more than adequate for basic types:

template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
BartC
2016-12-20 14:24:58 UTC
Permalink
Post by Kaz Kylheku
template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
If I just write that swap routine in-line in my own language, my poor
compiler generates 6 instructions to do that swap including having to
write to that temporary.

It would be a lot more if it was in a function and with reference
parameters as it can't in-line them (and doesn't have generics anyway).

However, 'swap' is built-in, and swap(x,y) generates only 4
instructions. And is generic. [On x64 it can be done in 3 instructions
using XCHG, for scalars, but it's much slower.]

So having such features as an integral part of a language makes the
compiler's job much easier. It doesn't need to depend on having
templates or references or being able to in-line functions or having to
optimise the code to eliminate the temporary.

Even if the compiler did all that anyway, you can choose to turn off
optimisations for a quick build but still get the benefit of a fast swap
routine.
--
bartc
Kaz Kylheku
2016-12-20 18:54:44 UTC
Permalink
Post by BartC
Post by Kaz Kylheku
template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
If I just write that swap routine in-line in my own language, my poor
compiler generates 6 instructions to do that swap including having to
write to that temporary.
It would be a lot more if it was in a function and with reference
parameters as it can't in-line them (and doesn't have generics anyway).
However, 'swap' is built-in, and swap(x,y) generates only 4
instructions.
That's a good indication that a priority in the development of
this compiler would be to get better code out of that swap
function, rather than stick in specialized operators.

For one thing, code which already performs a three-point rotation to
swap two variables is not going to rewrite itself to use your custom
operator.
Post by BartC
So having such features as an integral part of a language makes the
compiler's job much easier. It doesn't need to depend on having
templates or references or being able to in-line functions or having to
optimise the code to eliminate the temporary.
Sure, so if the requirements of compiler hobbyists are important,
then the more operators, the merrier.
BartC
2016-12-20 19:39:47 UTC
Permalink
Post by Kaz Kylheku
Post by BartC
Post by Kaz Kylheku
template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
If I just write that swap routine in-line in my own language, my poor
compiler generates 6 instructions to do that swap including having to
write to that temporary.
It would be a lot more if it was in a function and with reference
parameters as it can't in-line them (and doesn't have generics anyway).
However, 'swap' is built-in, and swap(x,y) generates only 4
instructions.
That's a good indication that a priority in the development of
this compiler would be to get better code out of that swap
function, rather than stick in specialized operators.
For one thing, code which already performs a three-point rotation to
swap two variables is not going to rewrite itself to use your custom
operator.
This is how you swap two things, a[f()] and b[g()], in Python:

a[f()], b[g()] = b[g()], a[f()]

If you look at the byte-code, f() and g() are each called twice. That's
undesirable. But even without side-effects, a term such as a[i+j] can
have i+j evaluated twice if no optimiser is available.

And because Python lacks references, doing this via a function is not an
option. A built-in swap feature would be a benefit.

But presumably you would advise dropping everything and getting a better
optimiser into these languages rather than do 1% of the work and just
adding this extra feature.
Post by Kaz Kylheku
Post by BartC
So having such features as an integral part of a language makes the
compiler's job much easier. It doesn't need to depend on having
templates or references or being able to in-line functions or having to
optimise the code to eliminate the temporary.
Sure, so if the requirements of compiler hobbyists are important,
then the more operators, the merrier.
But it's OK to have thousands of features in the standard library of a
bloated language like C++?

It's interesting that exchanging two values is common in instruction
sets, so there it is considered a primitive operation. But it's
apparently not primitive enough in your opinion to be part of a
low-level language.
--
Bartc
David Brown
2016-12-21 12:36:41 UTC
Permalink
Post by BartC
Post by Kaz Kylheku
Post by BartC
Post by Kaz Kylheku
template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
If I just write that swap routine in-line in my own language, my poor
compiler generates 6 instructions to do that swap including having to
write to that temporary.
It would be a lot more if it was in a function and with reference
parameters as it can't in-line them (and doesn't have generics anyway).
However, 'swap' is built-in, and swap(x,y) generates only 4
instructions.
That's a good indication that a priority in the development of
this compiler would be to get better code out of that swap
function, rather than stick in specialized operators.
For one thing, code which already performs a three-point rotation to
swap two variables is not going to rewrite itself to use your custom
operator.
a[f()], b[g()] = b[g()], a[f()]
If you look at the byte-code, f() and g() are each called twice. That's
undesirable.
And that is why, when writing Python code with a view to efficiency, you
don't write code like above. You write:

a_i, b_i = f(), g()
a[a_i], b[b_i] = b[b_i], a[a_i]

and everyone is happy.
Post by BartC
But even without side-effects, a term such as a[i+j] can
have i+j evaluated twice if no optimiser is available.
And because Python lacks references, doing this via a function is not an
option. A built-in swap feature would be a benefit.
A built-in swap function is not needed in Python - primarily because you
can write swaps perfectly well without adding a new "swap feature".
Post by BartC
But presumably you would advise dropping everything and getting a better
optimiser into these languages rather than do 1% of the work and just
adding this extra feature.
If your choice is between adding a new special feature to a language,
and improving the optimiser so that the feature is not needed, then yes
- improve the optimiser. Then people can benefit from it without having
to re-write their code, and the optimiser improvement might help for
other code too.
Post by BartC
Post by Kaz Kylheku
Post by BartC
So having such features as an integral part of a language makes the
compiler's job much easier. It doesn't need to depend on having
templates or references or being able to in-line functions or having to
optimise the code to eliminate the temporary.
Sure, so if the requirements of compiler hobbyists are important,
then the more operators, the merrier.
But it's OK to have thousands of features in the standard library of a
bloated language like C++?
Yes. Adding features to a library is fine - it does not require changes
to the language or the compilers, and you don't need to use the library
if you don't want to.
Post by BartC
It's interesting that exchanging two values is common in instruction
sets, so there it is considered a primitive operation. But it's
apparently not primitive enough in your opinion to be part of a
low-level language.
It is an operation that exists in /some/ instruction sets - not uncommon
in CISC designs, but rare in RISC processors. And there is nothing to
stop a compiler generating such instructions.


It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python. But it does not allow them - and much as I
dislike the C comma operator, it will not go away.
Wojtek Lerch
2016-12-21 15:07:17 UTC
Permalink
Post by David Brown
It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python. But it does not allow them - and much as I
dislike the C comma operator, it will not go away.
The syntax wouldn't have to be identical to Python's. Perhaps it could
use braces, like compound literals do:


{ a, b } = { b, a };

Or would the braces be too confusing to compilers?
David Brown
2016-12-21 16:01:43 UTC
Permalink
Post by Wojtek Lerch
Post by David Brown
It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python. But it does not allow them - and much as I
dislike the C comma operator, it will not go away.
The syntax wouldn't have to be identical to Python's. Perhaps it could
{ a, b } = { b, a };
Or would the braces be too confusing to compilers?
More likely it would be:

[a, b] = [b, a];

which Ben suggested.

This would match the structured bindings in C++17, which uses the format

auto [a, b] = std::make_tuple(x, y);

Using [a, b] to tie a and b together into a tuple would be a nice
addition to either C or C++. Realistically, I don't see it happening in
C - but perhaps in C++.
Tim Rentsch
2016-12-25 18:01:57 UTC
Permalink
Post by Wojtek Lerch
Post by David Brown
It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python. But it does not allow them - and much as I
dislike the C comma operator, it will not go away.
The syntax wouldn't have to be identical to Python's. Perhaps it
{ a, b } = { b, a };
I would like to see this idea generalized so it could be used
with struct types generally (and perhaps also non-struct types,
but structs are the most important).

When used in an lvalue context (eg, on LHS of an assignment), a
braced set of lvalues would serve to "take apart" a struct.

When used in an rvalue context, a braced set of rvalues would
serve to produce a struct value of a type appropriate to its
context. In a case like the assignment statement shown above,
the RHS would be an "anonymous" struct type, with member types
matching the types of the rvalue expressions.

Used in an lvalue context, it might be convenient to elide or
omit some of the targets:

{ x, y } = three_d_coordinate;
{ x,, z } = three_d_coordinate;

or a form that allows explict member selection, by analogy with
designated initializers:

{ d = .day, m = .month } = date_today;

Discarding a value altogether could be done with an empty set
of targets:

{} = printf( "Hello world\n" );

which to my eyes is much nicer than casting to void.

The usage in an rvalue context would be exactly like a compound
literal, except with greater flexibility in typing. Using a
designated initializer would serve to narrow the set of types
that could be used, without needing to give a specific type
name:

move_to( { 0, 0, 0 } );
draw_to( { .day = 1 } ); // ERROR!

Here the call to draw_to() gives an error because it expects a
coordinate value, with members x, y, and z. Since this set
does not include 'day' as a member, the types cannot possibly
match.

Besides being useful for things like swapping, a scheme along
these lines makes it very easy to return multiple values from
a function call, in a form that is more convenient than having to
assign to a variable of struct type, and then use members from
that variable to get out the individual values.
BartC
2016-12-21 19:44:52 UTC
Permalink
Post by David Brown
Post by BartC
a[f()], b[g()] = b[g()], a[f()]
If you look at the byte-code, f() and g() are each called twice. That's
undesirable.
And that is why, when writing Python code with a view to efficiency,
The problem isn't so much efficiency, but code that does the right thing.

The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
Post by David Brown
Post by BartC
But presumably you would advise dropping everything and getting a better
optimiser into these languages rather than do 1% of the work and just
adding this extra feature.
If your choice is between adding a new special feature to a language,
and improving the optimiser so that the feature is not needed, then yes
- improve the optimiser.
So here you advocate changing the compiler by working on the optimiser ...
Post by David Brown
Post by BartC
But it's OK to have thousands of features in the standard library of a
bloated language like C++?
Yes. Adding features to a library is fine - it does not require changes
to the language or the compilers,
... and here you advocate not changing the compiler! Even though the
task is a couple of magnitudes simpler, so could be adopted more easily.

There's a wonderfully fast C compiler called Tiny C, but it generates
poor code. Wouldn't it be great if new language features simply involved
added a few dozen lines of straightforward code and you'd immediately
get the full benefit? You wouldn't need to use -O3. Instead of relying
on the vagaries of different compilers' optimisers which may or may not
be able to recognise the new idiom to get the best code.
Post by David Brown
Post by BartC
It's interesting that exchanging two values is common in instruction
sets, so there it is considered a primitive operation.
It is an operation that exists in /some/ instruction sets - not uncommon
in CISC designs, but rare in RISC processors. And there is nothing to
stop a compiler generating such instructions.
As I said, it suggests that it was a considered a primitive operation.
Post by David Brown
It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python.
That reminds me of a feature I had on an old language which was along
those lines. The above would be written as:

stack a, b
unstack a, b

stack/unstack map to push and pop instructions. There are no type checks
so this could do type-punning (stack a float, unstack to an int).

I no longer use it as it's rather crude, but also because I can't
express it in C.
--
Bartc
David Brown
2016-12-22 09:09:23 UTC
Permalink
Post by BartC
Post by David Brown
Post by BartC
a[f()], b[g()] = b[g()], a[f()]
If you look at the byte-code, f() and g() are each called twice. That's
undesirable.
And that is why, when writing Python code with a view to efficiency,
The problem isn't so much efficiency, but code that does the right thing.
The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
This is /programming/ - it is not ditch-digging. Yes, you need to take
care with what you write and how you write it - the same applies to all
programming, in all programming languages.
Post by BartC
Post by David Brown
Post by BartC
But presumably you would advise dropping everything and getting a better
optimiser into these languages rather than do 1% of the work and just
adding this extra feature.
If your choice is between adding a new special feature to a language,
and improving the optimiser so that the feature is not needed, then yes
- improve the optimiser.
So here you advocate changing the compiler by working on the optimiser ...
I am advocating keeping the language itself simpler and making the best
possible implementation of that language. Even C++ does not add new
/language/ features unless there is a very strong case.
Post by BartC
Post by David Brown
Post by BartC
But it's OK to have thousands of features in the standard library of a
bloated language like C++?
Yes. Adding features to a library is fine - it does not require changes
to the language or the compilers,
... and here you advocate not changing the compiler! Even though the
task is a couple of magnitudes simpler, so could be adopted more easily.
I am advocating adding to the /library/, rather than to the /language/.

I think you are having trouble understanding the difference between the
language itself, compilers that implement the language, the library
specifications, and implementations of the library.
Post by BartC
There's a wonderfully fast C compiler called Tiny C, but it generates
poor code. Wouldn't it be great if new language features simply involved
added a few dozen lines of straightforward code and you'd immediately
get the full benefit? You wouldn't need to use -O3. Instead of relying
on the vagaries of different compilers' optimisers which may or may not
be able to recognise the new idiom to get the best code.
Are you seriously suggesting that we could add enough new features,
operators, and built-in functions to C and then small and simple
compilers would magically generate as efficient code as large and
complex ones?

What's next after "swap" ? Do you want to add an operator that
calculates "4*a + b" just because that fits an x86 addressing mode, and
people could then use it to do efficient multiplication by 5 even on
Tiny C? Personally, I'd rather continue to write "x * 5" and let a good
compiler generate the efficient code on processors that have such
instructions.
Post by BartC
Post by David Brown
Post by BartC
It's interesting that exchanging two values is common in instruction
sets, so there it is considered a primitive operation.
It is an operation that exists in /some/ instruction sets - not uncommon
in CISC designs, but rare in RISC processors. And there is nothing to
stop a compiler generating such instructions.
As I said, it suggests that it was a considered a primitive operation.
The fact that it is common on larger CISC ISA's suggest that the chip
designers thought it was useful enough to be worth including. No more
and no less.

And the fact that it is /not/ common on newer processors and in
particular, it is not common on RISC designs, suggests that as an
operation it is not useful enough to be worth the cost of including it
in the processor. (Swap instructions were cheap to implement in single
cpu, single bus master, in-order processors. They are /very/ expensive
to implement in modern multi-core out-of-order processors with
load-store architectures.)
Post by BartC
Post by David Brown
It would be nice, IMHO, if C allowed expressions like "a, b = b, a" in
the same way as Python.
That reminds me of a feature I had on an old language which was along
stack a, b
unstack a, b
stack/unstack map to push and pop instructions. There are no type checks
so this could do type-punning (stack a float, unstack to an int).
If you want to program in Forth, that's fine - but it's a very different
language from C.
Post by BartC
I no longer use it as it's rather crude, but also because I can't
express it in C.
Thiago Adams
2016-12-22 11:24:02 UTC
Permalink
On Thursday, December 22, 2016 at 7:09:25 AM UTC-2, David Brown

[...]
Post by David Brown
I am advocating adding to the /library/, rather than to the /language/.
How to add into the library?
Is it possible to create an efficient swap function without compiler extensions?

(My answer so far is : No it's not possible )
Thiago Adams
2016-12-22 12:10:06 UTC
Permalink
Post by Thiago Adams
On Thursday, December 22, 2016 at 7:09:25 AM UTC-2, David Brown
[...]
Post by David Brown
I am advocating adding to the /library/, rather than to the /language/.
How to add into the library?
Actually, for primitive data types, this could be added today.

swap_int(int *a, int *b)
swap_double(double *a, double *b)
...

swap_array_int(int *a, int * b, size_t s);
swap_array_double(double *a, double * b, size_t s);
...

and then _Generic to select.

The remaining case not covered is struct/union.

There is one argument against operator = for structs is that this operator is not something basic. It's not necessary.
This swap magic is something similar. This is not the first time C add something that is not basic.
David Brown
2016-12-22 12:37:59 UTC
Permalink
Post by Thiago Adams
On Thursday, December 22, 2016 at 7:09:25 AM UTC-2, David Brown
[...]
Post by David Brown
I am advocating adding to the /library/, rather than to the /language/.
How to add into the library?
Is it possible to create an efficient swap function without compiler extensions?
(My answer so far is : No it's not possible )
You missed my point. It was fine to add swap to /C++/, because it could
be done efficiently in the library without adding more to the language
(C++ already had templates and references). Standard C does not have
the required language features (though as you point out, C11 _Generic
can handle some cases).

It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
"typeof" construct would do it:

#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)

Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
Thiago Adams
2016-12-22 13:18:57 UTC
Permalink
Post by David Brown
Post by Thiago Adams
On Thursday, December 22, 2016 at 7:09:25 AM UTC-2, David Brown
[...]
Post by David Brown
I am advocating adding to the /library/, rather than to the /language/.
How to add into the library?
Is it possible to create an efficient swap function without compiler extensions?
(My answer so far is : No it's not possible )
You missed my point. It was fine to add swap to /C++/, because it could
be done efficiently in the library without adding more to the language
(C++ already had templates and references). Standard C does not have
the required language features (though as you point out, C11 _Generic
can handle some cases).
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I think it is a good solution.
Having an alternative, the only argument in favor of swap would arrays
int a[5];
int b[5];
swap(a, b);
and maybe performance.
David Brown
2016-12-23 09:13:55 UTC
Permalink
Post by Thiago Adams
Post by David Brown
Post by Thiago Adams
On Thursday, December 22, 2016 at 7:09:25 AM UTC-2, David Brown
[...]
Post by David Brown
I am advocating adding to the /library/, rather than to the /language/.
How to add into the library?
Is it possible to create an efficient swap function without compiler extensions?
(My answer so far is : No it's not possible )
You missed my point. It was fine to add swap to /C++/, because it could
be done efficiently in the library without adding more to the language
(C++ already had templates and references). Standard C does not have
the required language features (though as you point out, C11 _Generic
can handle some cases).
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I think it is a good solution.
Having an alternative, the only argument in favor of swap would arrays
int a[5];
int b[5];
swap(a, b);
and maybe performance.
The "performance" argument is moot - a modern optimising compiler will
generate the best possible swap code given the swap macro above
(assuming support for "typeof").

As for swapping arrays - well, arrays are different from other types in
C. You can't assign them, compare them or pass them as values (unless
you wrap them in a struct). It would be very strange and inconsistent
to be able to swap them.

(I'd be happy if they /were/ treated like other types, and then be
"swapable". But that's just not the way arrays work in C, and that is
not going to change.)

For swapping arrays, the best idea IMHO is a "memswap" function that
someone suggested earlier.
Ben Bacarisse
2016-12-22 13:23:11 UTC
Permalink
David Brown <***@hesbynett.no> writes:
<snip>
Post by David Brown
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I agree about the value of typeof() but it doesn't solve all the
problems of writing swap. You'd like to rule out accidents like

int x;
float y;
...
swap(x, y);

While it's not obviously wrong to write that, it probably does indicate
a mistake. A static assert on the sizes being equal would help but does
not address the main issue. I suppose one could have a new form of
constant expression: typeof(a) == typeof(b).
--
Ben.
BartC
2016-12-22 15:31:26 UTC
Permalink
Post by Ben Bacarisse
<snip>
Post by David Brown
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I agree about the value of typeof() but it doesn't solve all the
problems of writing swap. You'd like to rule out accidents like
int x;
float y;
...
swap(x, y);
While it's not obviously wrong to write that, it probably does indicate
a mistake. A static assert on the sizes being equal would help but does
not address the main issue.
You raise a good point about the actual meaning of swap().

Usually it would exchange two values of the same type and size.

But some might expect behaviour like this:

temp=x; x=y; y=temp;

Where x and y are of different types and/or widths (char and long, or
int and float); and temp might be the same type as x or y, or something
else entirely, perhaps that can accommodate either x or y value.

I don't think this is practical to cater for in a built-in or library
swap(), so let people just keep on coding it explicitly.
Post by Ben Bacarisse
I suppose one could have a new form of
constant expression: typeof(a) == typeof(b).
I implement typeof() [in some other language], and also allow types,
including typeof(), to be used in expressions; they just yield an integer.

So typeof(a) == typeof(b) can be actually be done, allowing:

int x=typeof(a);
...
switch (x) {
case int:
case double:

etc. in actual code not just generics.
--
Bartc
Thiago Adams
2016-12-22 16:08:37 UTC
Permalink
Post by BartC
Post by Ben Bacarisse
<snip>
Post by David Brown
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I agree about the value of typeof() but it doesn't solve all the
problems of writing swap. You'd like to rule out accidents like
int x;
float y;
...
swap(x, y);
While it's not obviously wrong to write that, it probably does indicate
a mistake. A static assert on the sizes being equal would help but does
not address the main issue.
You raise a good point about the actual meaning of swap().
Usually it would exchange two values of the same type and size.
temp=x; x=y; y=temp;
Where x and y are of different types and/or widths (char and long, or
int and float); and temp might be the same type as x or y, or something
else entirely, perhaps that can accommodate either x or y value.
I don't think this is practical to cater for in a built-in or library
swap(), so let people just keep on coding it explicitly.
The same problem applies for operator =.
The assignment operator destroys the state of your object, it's just memcpy.
So, we need sometimes create a meaningful "set" function to change state in a proper way.
In some cases "swap" needs to be customized or not used, just like the assignment needs to be avoided sometimes.

The assignment is really special in C.
It's the only exception of making struct behavior similar of primitive types.

The operators == , != don't have the same luck.

struct S s1;
struct S s2;

s1 = s2; // ok - memcpy
s1 == s2; // not ok - memcmp
BartC
2016-12-22 16:22:33 UTC
Permalink
Post by Thiago Adams
Post by BartC
temp=x; x=y; y=temp;
Where x and y are of different types and/or widths (char and long, or
int and float); and temp might be the same type as x or y, or something
else entirely, perhaps that can accommodate either x or y value.
The same problem applies for operator =.
The assignment operator destroys the state of your object, it's just memcpy.
Assignment is easier to deal with. With:

x = y;

there might be some conversion from y to x. But it's in one direction.
However, swap is bi-directional! I suppose this could be done:

T x, tempx;
U y, tempy;

tempx = x;
tempy = y;
x = tempy;
y = tempx;

provided T and U are compatible.
Post by Thiago Adams
The operators == , != don't have the same luck.
struct S s1;
struct S s2;
s1 = s2; // ok - memcpy
s1 == s2; // not ok - memcmp
I didn't know that. But why doesn't "==" work between structs?

I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.

(I have a project that targets C code which I now have to change as it
generates "==" for structs. Obviously it's never come up before!)
--
Bartc
Jakob Bohm
2016-12-22 16:35:28 UTC
Permalink
Post by Kaz Kylheku
Post by Thiago Adams
Post by BartC
temp=x; x=y; y=temp;
Where x and y are of different types and/or widths (char and long, or
int and float); and temp might be the same type as x or y, or something
else entirely, perhaps that can accommodate either x or y value.
The same problem applies for operator =.
The assignment operator destroys the state of your object, it's just memcpy.
x = y;
there might be some conversion from y to x. But it's in one direction.
T x, tempx;
U y, tempy;
tempx = x;
tempy = y;
x = tempy;
y = tempx;
provided T and U are compatible.
Post by Thiago Adams
The operators == , != don't have the same luck.
struct S s1;
struct S s2;
s1 = s2; // ok - memcpy
s1 == s2; // not ok - memcmp
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
(I have a project that targets C code which I now have to change as it
generates "==" for structs. Obviously it's never come up before!)
The mechanisms and documentation for assigning structs is needed to
pass them by value. None of the other operations are similarly needed
by a core language feature, and were thus deemed superfluous in a
deliberately minimalist language.

Also for C (as opposed to C++) types, there is no semantic difference
between a memcpy()-style assignment and a member-by-member assignment,
because the member types will be the same (not just assignment-
compatible), and any padding bits/bytes are not supposed to matter, if
some program wants them to matter, it should give them names as fields
of their own.

At least one compiler (GCC) claims in its documentation to have
optimizations that process the struct members separately within a
function, thus possibly not even storing the padding bits/bytes in
those cases.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
David Brown
2016-12-23 10:06:07 UTC
Permalink
Post by Jakob Bohm
Post by Kaz Kylheku
Post by Thiago Adams
Post by BartC
temp=x; x=y; y=temp;
Where x and y are of different types and/or widths (char and long, or
int and float); and temp might be the same type as x or y, or something
else entirely, perhaps that can accommodate either x or y value.
The same problem applies for operator =.
The assignment operator destroys the state of your object, it's just memcpy.
x = y;
there might be some conversion from y to x. But it's in one direction.
T x, tempx;
U y, tempy;
tempx = x;
tempy = y;
x = tempy;
y = tempx;
provided T and U are compatible.
Post by Thiago Adams
The operators == , != don't have the same luck.
struct S s1;
struct S s2;
s1 = s2; // ok - memcpy
s1 == s2; // not ok - memcmp
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
(I have a project that targets C code which I now have to change as it
generates "==" for structs. Obviously it's never come up before!)
The mechanisms and documentation for assigning structs is needed to
pass them by value. None of the other operations are similarly needed
by a core language feature, and were thus deemed superfluous in a
deliberately minimalist language.
Also for C (as opposed to C++) types, there is no semantic difference
between a memcpy()-style assignment and a member-by-member assignment,
because the member types will be the same (not just assignment-
compatible), and any padding bits/bytes are not supposed to matter, if
some program wants them to matter, it should give them names as fields
of their own.
On the other hand, comparison of structs would have to be done
member-wise in many cases, because a memcmp would also compare padding
bits/bytes, and I believe it is possible for floating point numbers to
be semantically equal but with different bit representations. It would
be nice to be able to compare structs directly in C, but I can
understand why the feature is missing.
Post by Jakob Bohm
At least one compiler (GCC) claims in its documentation to have
optimizations that process the struct members separately within a
function, thus possibly not even storing the padding bits/bytes in
those cases.
This is covered by the "as-is" rule. I don't think gcc is alone in
splitting up structs like this - it is especially common when the struct
is a local variable. The different parts of the struct can be treated
separately, kept in different registers, optimised or eliminated
independently, etc.
Post by Jakob Bohm
Enjoy
Jakob
Philip Lantz
2016-12-24 18:55:37 UTC
Permalink
... comparison of structs would have to be done
member-wise in many cases, because a memcmp would also compare padding
bits/bytes, and I believe it is possible for floating point numbers to
be semantically equal but with different bit representations.
As well as semantically unequal but with identical bit representations.
Keith Thompson
2016-12-22 17:09:42 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
But sometimes that's exactly what you need to do. If you use memcpy(),
you risk getting incorrect results if there are padding bytes. And
the meaning of *logical* equality for structures depends on the meaning
of the values. It's common for structures to contain arrays some of
whose elements are irrelevant, or pointers where you care about equality
of the objects they point to, not of the pointers themselves. Think
about a dynamic string type, for example. Floating-point members on an
implementation that supports NaNs could also be tricky.

Byte-by-byte equality is not safe or particularly useful, and we already
have memcmp(). Member-by-member equality is more likely to be useful,
but hasn't been considered useful enough to build it into the language.
Post by BartC
(I have a project that targets C code which I now have to change as it
generates "==" for structs. Obviously it's never come up before!)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-22 17:22:18 UTC
Permalink
Post by Keith Thompson
[...]
Post by BartC
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
But sometimes that's exactly what you need to do. If you use memcpy(),
you risk getting incorrect results if there are padding bytes. And
the meaning of *logical* equality for structures depends on the meaning
of the values. It's common for structures to contain arrays some of
whose elements are irrelevant, or pointers where you care about equality
of the objects they point to, not of the pointers themselves. Think
about a dynamic string type, for example. Floating-point members on an
implementation that supports NaNs could also be tricky.
Byte-by-byte equality is not safe or particularly useful, and we already
have memcmp(). Member-by-member equality is more likely to be useful,
but hasn't been considered useful enough to build it into the language.
So why not have defined == to be equivalent to memcpy between structs?
Then you get != for free.

Or you can use == between opaque types without needing to know if they
are scalars or structs.

And given:

typedef struct {char r,g,b,a;} Pixel;

it seems overkill to write:

if (memcpy(x,RED,sizeof x))

to compare what is more than likely to be a single 32-bit value, when
you could just as easily have written:

if (x == RED)
--
Bartc
Keith Thompson
2016-12-22 17:48:07 UTC
Permalink
[...]
Post by BartC
Post by Keith Thompson
Byte-by-byte equality is not safe or particularly useful, and we already
have memcmp(). Member-by-member equality is more likely to be useful,
but hasn't been considered useful enough to build it into the language.
So why not have defined == to be equivalent to memcpy between structs?
Then you get != for free.
I presume you mean memcmp, and I already answered that: because
byte-by-byte equality is not safe or particularly useful.
Post by BartC
Or you can use == between opaque types without needing to know if they
are scalars or structs.
Sure, but how often is "==" meaningful for opaque types? Certainly it
*can* be meaningful in some cases, but then you can define an equality
function that accounts for the semantics of the type. (What would it
mean for two FILE objects to be equal?)
Post by BartC
typedef struct {char r,g,b,a;} Pixel;
(Digression: unsigned char would make a lot more sense.)
Post by BartC
if (memcpy(x,RED,sizeof x))
I think you mean

if (memcmp(x,RED,sizeof x) == 0)
Post by BartC
to compare what is more than likely to be a single 32-bit value, when
if (x == RED)
That's particularly convenient because the structure happens to be the
same size as a 32-bit integer. Suppose you don't have the "a" member:

typedef struct {unsigned char r, g, b;} Pixel;

and the compiler pads the structure to 32 bits. What exactly would
(x == RED) mean in that case?

I agree that it would be nice to be able to write (x == RED), but if
you need to add a new language feature to do it, you can't just gloss
over the details.

If you really wanted to, you could have
typedef uint32_t Pixel;
and define macros to set and extract the r, g, b, and a members;
then "==" and "!=" would work. Or you could write an equality
function or macro.

I'm not arguing that struct equality is either impossible or useless,
just that it's not as simple as you seem to be saying it is.
(And even if all the details were resolved, I doubt that it would
be added to the language.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-22 19:01:52 UTC
Permalink
Post by Keith Thompson
Post by BartC
typedef struct {char r,g,b,a;} Pixel;
if (memcpy(x,RED,sizeof x))
I think you mean
if (memcmp(x,RED,sizeof x) == 0)
I'd forgotten for a minute exactly how memcpy works. This reminds me
that it doesn't just check if one block of memory exactly matches
another, but also whether or is greater or less than the other.

Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
Post by Keith Thompson
Post by BartC
to compare what is more than likely to be a single 32-bit value, when
if (x == RED)
That's particularly convenient because the structure happens to be the
same size as a 32-bit integer.
It might be worth forcing a struct to be 4 bytes just so things work out
better. But it can't done if it represents an actual 3-byte value
embedded in other data.
Post by Keith Thompson
typedef struct {unsigned char r, g, b;} Pixel;
and the compiler pads the structure to 32 bits. What exactly would
(x == RED) mean in that case?
Then it's a 3-byte compare, and gets messier for the code generator.
Post by Keith Thompson
I agree that it would be nice to be able to write (x == RED), but if
you need to add a new language feature to do it, you can't just gloss
over the details.
If you really wanted to, you could have
typedef uint32_t Pixel;
and define macros to set and extract the r, g, b, and a members;
then "==" and "!=" would work. Or you could write an equality
function or macro.
In my own code it does often end up as a 32-bit int type.
--
Bartc
Keith Thompson
2016-12-22 19:24:33 UTC
Permalink
Post by BartC
Post by Keith Thompson
Post by BartC
typedef struct {char r,g,b,a;} Pixel;
if (memcpy(x,RED,sizeof x))
I think you mean
if (memcmp(x,RED,sizeof x) == 0)
I'd forgotten for a minute exactly how memcpy works. This reminds me
------
Post by BartC
that it doesn't just check if one block of memory exactly matches
another, but also whether or is greater or less than the other.
memcpy (memory *copy*) and memcmp (memory *compare*) are two different
functions with very different semantics. You keep using the wrong name.
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
Post by BartC
Post by Keith Thompson
Post by BartC
to compare what is more than likely to be a single 32-bit value, when
if (x == RED)
That's particularly convenient because the structure happens to be the
same size as a 32-bit integer.
It might be worth forcing a struct to be 4 bytes just so things work out
better. But it can't done if it represents an actual 3-byte value
embedded in other data.
Post by Keith Thompson
typedef struct {unsigned char r, g, b;} Pixel;
and the compiler pads the structure to 32 bits. What exactly would
(x == RED) mean in that case?
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
Post by BartC
Post by Keith Thompson
I agree that it would be nice to be able to write (x == RED), but if
you need to add a new language feature to do it, you can't just gloss
over the details.
If you really wanted to, you could have
typedef uint32_t Pixel;
and define macros to set and extract the r, g, b, and a members;
then "==" and "!=" would work. Or you could write an equality
function or macro.
In my own code it does often end up as a 32-bit int type.
Ok, that's nice. But if you're talking about changing the language to
allow "==" on structs, you're *still* ignoring the details.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-22 19:37:25 UTC
Permalink
Post by Keith Thompson
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
I was thinking of the extra effort needed to do a relative compare and
to return one of -1, 0 or 1 instead of just 1 or 0.
Post by Keith Thompson
Post by BartC
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
What else would it be other than checking of 3 bytes all being
respectively equal to 3 other bytes?

(In one of my dynamic languages, I support two types of records. One is
a high level one where a field can be any dynamic type; that requires a
recursive field-by-field compare. The other is a C-like 'flat' struct
which is simply compared as a bunch of bytes. And looking at the code,
it's just the equivalent of:

memcmp(p,q,n)==0

At least it wasn't memcpy this time!)
--
bartc
Keith Thompson
2016-12-22 19:57:03 UTC
Permalink
Post by BartC
Post by Keith Thompson
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
I was thinking of the extra effort needed to do a relative compare and
to return one of -1, 0 or 1 instead of just 1 or 0.
The extra effort is minimal; a simple implementation can just return the
result of subtracting the corresponding elements. A memequal() function
might be marginally faster, and if you want to advocate adding it go
ahead, but I don't think it's worth it.
Post by BartC
Post by Keith Thompson
Post by BartC
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
What else would it be other than checking of 3 bytes all being
respectively equal to 3 other bytes?
It could also compare the entire structure, including any padding bytes.
If that's not what you mean, that's fine, but it still needs to be
specified. (Which I suppose you already did by calling it a "3-byte
compare".)

Are you proposing a change to the language, to permit "==" for
structures? If so, do you want to define it recursively in terms of
comparisons of all the members? (Which introduces some complexity for
floating-point NaNs, bit-fields, and gaps between members.)

In my opinion, byte-by-byte comparison of structures is *usually* not
meaningful (and we already have memcmp()). Member-by-member comparison
is more likely to be useful, but not enough so to justify adding it as a
built-in language feature.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jakob Bohm
2016-12-27 16:25:50 UTC
Permalink
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
I was thinking of the extra effort needed to do a relative compare and
to return one of -1, 0 or 1 instead of just 1 or 0.
The extra effort is minimal; a simple implementation can just return the
result of subtracting the corresponding elements. A memequal() function
might be marginally faster, and if you want to advocate adding it go
ahead, but I don't think it's worth it.
Post by BartC
Post by Keith Thompson
Post by BartC
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
What else would it be other than checking of 3 bytes all being
respectively equal to 3 other bytes?
It could also compare the entire structure, including any padding bytes.
If that's not what you mean, that's fine, but it still needs to be
specified. (Which I suppose you already did by calling it a "3-byte
compare".)
Are you proposing a change to the language, to permit "==" for
structures? If so, do you want to define it recursively in terms of
comparisons of all the members? (Which introduces some complexity for
floating-point NaNs, bit-fields, and gaps between members.)
In my opinion, byte-by-byte comparison of structures is *usually* not
meaningful (and we already have memcmp()). Member-by-member comparison
is more likely to be useful, but not enough so to justify adding it as a
built-in language feature.
In my experience, byte-by-byte comparison of structures is useful only
if the structures were carefully designed with this in mind. But once
so designed, it can be highly efficient in terms of both source code
and implementation performance.

Adding member-by-member comparison would also entail somehow declaring
the ordering (greater or less) for each field or field type, and to
declare the order of comparison significance (should {r, g, b, a}
structures be compared and sorted first by r, then by g etc. or the
other way round).

Adding such notation to C would be quite complex, and is a major
feature of basic C++, so no point in adding this to C.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Thiago Adams
2016-12-27 16:42:44 UTC
Permalink
Post by Jakob Bohm
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
I was thinking of the extra effort needed to do a relative compare and
to return one of -1, 0 or 1 instead of just 1 or 0.
The extra effort is minimal; a simple implementation can just return the
result of subtracting the corresponding elements. A memequal() function
might be marginally faster, and if you want to advocate adding it go
ahead, but I don't think it's worth it.
Post by BartC
Post by Keith Thompson
Post by BartC
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
What else would it be other than checking of 3 bytes all being
respectively equal to 3 other bytes?
It could also compare the entire structure, including any padding bytes.
If that's not what you mean, that's fine, but it still needs to be
specified. (Which I suppose you already did by calling it a "3-byte
compare".)
Are you proposing a change to the language, to permit "==" for
structures? If so, do you want to define it recursively in terms of
comparisons of all the members? (Which introduces some complexity for
floating-point NaNs, bit-fields, and gaps between members.)
In my opinion, byte-by-byte comparison of structures is *usually* not
meaningful (and we already have memcmp()). Member-by-member comparison
is more likely to be useful, but not enough so to justify adding it as a
built-in language feature.
In my experience, byte-by-byte comparison of structures is useful only
if the structures were carefully designed with this in mind. But once
so designed, it can be highly efficient in terms of both source code
and implementation performance.
Adding member-by-member comparison would also entail somehow declaring
the ordering (greater or less) for each field or field type, and to
declare the order of comparison significance (should {r, g, b, a}
structures be compared and sorted first by r, then by g etc. or the
other way round).
Adding such notation to C would be quite complex, and is a major
feature of basic C++, so no point in adding this to C.
I don't understand the comparisons with C++.
C++ doesn't have the "generated" operator ==, !=, > etc.
The C++ have the operator = generated.(just like C)

The feature that C++ has is operator overloading but I think this can be considered separately from "auto-generated" operators.

If the operator == where generated automatically then in C++ we would have:

class X
{
//use the default
bool operator == (const X& other) = default;

//remove default
bool operator == (const X& other) = delete;

//custom
bool operator == (const X& other)
{

}
}

The only difference of C++ and C for == is that in C++ you can override the syntax.
Thiago Adams
2016-12-29 15:10:45 UTC
Permalink
Post by Thiago Adams
Post by Jakob Bohm
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by BartC
Such a compare is not meaningful for structs. But I don't know if
there's a mem- function that compares only for equal/not equal.
memcmp() compares byte-by-byte and returns an int value less than, equal
to, or greater than 0. You can compare two chunks of memory for
bytewise equality by (memcmp(x, y, sizeof x) == 0); for inequality,
change the "==" to "!=". No, there's no standard comparison function
that doesn't distinguish between less-than and greater-than, but it's
trivially easy to ignore the distinction if you don't need it.
I was thinking of the extra effort needed to do a relative compare and
to return one of -1, 0 or 1 instead of just 1 or 0.
The extra effort is minimal; a simple implementation can just return the
result of subtracting the corresponding elements. A memequal() function
might be marginally faster, and if you want to advocate adding it go
ahead, but I don't think it's worth it.
Post by BartC
Post by Keith Thompson
Post by BartC
Then it's a 3-byte compare, and gets messier for the code generator.
Sure, but it gets messier how? What would the semantics of such an "=="
operator be?
What else would it be other than checking of 3 bytes all being
respectively equal to 3 other bytes?
It could also compare the entire structure, including any padding bytes.
If that's not what you mean, that's fine, but it still needs to be
specified. (Which I suppose you already did by calling it a "3-byte
compare".)
Are you proposing a change to the language, to permit "==" for
structures? If so, do you want to define it recursively in terms of
comparisons of all the members? (Which introduces some complexity for
floating-point NaNs, bit-fields, and gaps between members.)
In my opinion, byte-by-byte comparison of structures is *usually* not
meaningful (and we already have memcmp()). Member-by-member comparison
is more likely to be useful, but not enough so to justify adding it as a
built-in language feature.
In my experience, byte-by-byte comparison of structures is useful only
if the structures were carefully designed with this in mind. But once
so designed, it can be highly efficient in terms of both source code
and implementation performance.
Adding member-by-member comparison would also entail somehow declaring
the ordering (greater or less) for each field or field type, and to
declare the order of comparison significance (should {r, g, b, a}
structures be compared and sorted first by r, then by g etc. or the
other way round).
Adding such notation to C would be quite complex, and is a major
feature of basic C++, so no point in adding this to C.
I don't understand the comparisons with C++.
C++ doesn't have the "generated" operator ==, !=, > etc.
The C++ have the operator = generated.(just like C)
The feature that C++ has is operator overloading but I think this can be considered separately from "auto-generated" operators.
class X
{
//use the default
bool operator == (const X& other) = default;
//remove default
bool operator == (const X& other) = delete;
//custom
bool operator == (const X& other)
{
}
}
The only difference of C++ and C for == is that in C++ you can override the syntax.
I didn't known, but C++ already has some proposals.

Defaulted comparison operators
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3950.html
Keith Thompson
2016-12-27 16:50:04 UTC
Permalink
Jakob Bohm <jb-***@wisemo.com> writes:
[...]
Post by Jakob Bohm
Adding member-by-member comparison would also entail somehow declaring
the ordering (greater or less) for each field or field type, and to
declare the order of comparison significance (should {r, g, b, a}
structures be compared and sorted first by r, then by g etc. or the
other way round).
Adding such notation to C would be quite complex, and is a major
feature of basic C++, so no point in adding this to C.
I don't recall anyone was suggesting supporting <, <=, >, and >=
for structures, just == and !=.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2016-12-23 10:25:56 UTC
Permalink
Post by Keith Thompson
[...]
Post by BartC
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
But sometimes that's exactly what you need to do. If you use memcpy(),
you risk getting incorrect results if there are padding bytes. And
...
Didn't /you/ mean memcmp here rather than memcpy?

(It's so rare that you make such mistakes, so I have to take the chance
to point it out - especially since you spotting the same typo in Bart's
posts :-) )
Keith Thompson
2016-12-23 19:30:15 UTC
Permalink
Post by David Brown
Post by Keith Thompson
[...]
Post by BartC
I didn't know that. But why doesn't "==" work between structs?
I can understand that uninitialised padding bytes inside a struct can
cause problems, but if people have to resort to memcpy, then they will
have the same problems. No-one wants to have to compare structs
member-by-member.
But sometimes that's exactly what you need to do. If you use memcpy(),
you risk getting incorrect results if there are padding bytes. And
...
Didn't /you/ mean memcmp here rather than memcpy?
D'oh! Yes, I did.
Post by David Brown
(It's so rare that you make such mistakes, so I have to take the chance
to point it out - especially since you spotting the same typo in Bart's
posts :-) )
It's a fair cop.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2016-12-23 09:28:37 UTC
Permalink
Post by Ben Bacarisse
<snip>
Post by David Brown
It would not take much addition to C to make it possible to implement
"swap" as a macro (and therefore possibly part of the library). gcc's
#define swap(a, b) \
do { \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
Adding "typeof" to the C language would be a lot more useful than adding
a "swap" feature. It would also be fairly easy to implement - gcc and
clang already have it, and most other major C compilers are also C++
compilers and therefore have "decltype" and "auto" - "typeof" could
probably share some of that work.
I agree about the value of typeof() but it doesn't solve all the
problems of writing swap. You'd like to rule out accidents like
int x;
float y;
...
swap(x, y);
Maybe you would /not/ want to rule out such uses - you are swapping the
/values/ of the two variables, not their low-level contents or bit patterns.

You might want warnings to spot such things, such as
"-Wfloat-conversion" in gcc.
Post by Ben Bacarisse
While it's not obviously wrong to write that, it probably does indicate
a mistake. A static assert on the sizes being equal would help but does
not address the main issue. I suppose one could have a new form of
constant expression: typeof(a) == typeof(b).
I'd like that. I am always in favour of as much static checking as
possible.

Again, gcc has a solution - but while I think it might be reasonable to
put the gcc extension "typeof" into the C standards, I'd be wary of
moving just any old gcc extension into the standards. However, for
completeness:

#define swap(a, b) \
do { \
_Static_assert(_builtin_types_compatible_p( \
typeof(a), typeof(b)); \
typeof(a) t = b; \
b = a; \
a = t; \
} while (0)
BartC
2016-12-22 11:41:46 UTC
Permalink
Post by David Brown
Post by BartC
The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
This is /programming/ - it is not ditch-digging. Yes, you need to take
care with what you write and how you write it - the same applies to all
programming, in all programming languages.
I can see I'm never going to convince you of anything! So C, is perfect
as it is; Python is perfect as it is. Nothing needs to be done. If
anyone needs to go a little out of their way, then it's just
'programming'. If something is error-prone, you just need to take more
care, or do more unit-tests.

(Although that doesn't explain why there is a new version of Python
every five minutes.)
Post by David Brown
What's next after "swap" ? Do you want to add an operator that
calculates "4*a + b" just because that fits an x86 addressing mode, and
people could then use it to do efficient multiplication by 5 even on
Tiny C? Personally, I'd rather continue to write "x * 5" and let a good
compiler generate the efficient code on processors that have such
instructions.
You're being silly now. Doesn't C already have ++a and a+=b and P->m and
a?b:c and a && b and switch? All could be dispensed with:

++a a=a+1
a+=b a=a+b
P->m (*P).m
a?b:c if (a) temp=b; else temp=c; ... temp ...
a && b !!a & !!b
a && b if (a) if (b) ...
switch(x) if (x==...)...

There are a number of constructs with both add expressibility and also
tell the compiler exactly what it is that is being done rather than the
compiler having to deduce it.

There are a few others of which swap is one:

swap
min and max
n-way select (a?b:c is 2-way select)
n-times loops
a to b loops
non-int/non-const switch (here, where C has an if/else-if chain
comparing the same expr against a number of values; see below)
bit extraction
2-way/n-way select as lvalues
chained compares (Python-style a==b==c etc)

I know that you have a thing about gcc and its wonderful optimiser,
which lets you do things like this:

if (x[i+1]=a) {...}
else if (x[i+1]=f(i)) {...}
else if (x[i+1]....

So you write 'x[i+1]' N times and gcc magically generates code that only
evaluates it once (even being clever enough to know that, if i is global
and f() is external, f() won't change i for the third compare).

But hang on ... wouldn't it be better if you only wrote 'x[i+1]' *once*?
And that way the value is frozen so calls like f() can't effect
subsequent comparisons.

I guess you know better...
Post by David Brown
And the fact that it is /not/ common on newer processors and in
particular, it is not common on RISC designs, suggests that as an
operation it is not useful enough to be worth the cost of including it
in the processor. (Swap instructions were cheap to implement in single
cpu, single bus master, in-order processors. They are /very/ expensive
to implement in modern multi-core out-of-order processors with
load-store architectures.)
That's true; XCHG on x64 is too slow to be useful. But it doesn't change
the fact that it was /desirable/.
Post by David Brown
Post by BartC
That reminds me of a feature I had on an old language which was along
stack a, b
unstack a, b
stack/unstack map to push and pop instructions. There are no type checks
so this could do type-punning (stack a float, unstack to an int).
If you want to program in Forth, that's fine - but it's a very different
language from C.
This wasn't Forth, but an early version of my C-like language. But
what's wrong with it? It ticks all the boxes. It can do 'a,b=b,a' and a
lot more (via a macro such as ASSIGN2(a,b, b,c) if required). It doesn't
need to use the hardware stack.
--
Bartc
David Brown
2016-12-22 13:13:14 UTC
Permalink
Post by BartC
Post by David Brown
Post by BartC
The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
This is /programming/ - it is not ditch-digging. Yes, you need to take
care with what you write and how you write it - the same applies to all
programming, in all programming languages.
I can see I'm never going to convince you of anything! So C, is perfect
as it is; Python is perfect as it is. Nothing needs to be done. If
anyone needs to go a little out of their way, then it's just
'programming'. If something is error-prone, you just need to take more
care, or do more unit-tests.
You do a fine job at taking a small point, extrapolating it wildly, then
leaping to ridiculous conclusions. Sit down and relax - take a deep
breath, count to ten, put on some whale song music, or whatever helps
you calm down.

C has its flaws. If you had actually /read/ anything I have posted, you
would know I think that. But it works fine as it is - most C
programmers do not seem to have problems understanding pointers and
arrays, getting braces right, or any of the other things you find nearly
impossible. Most C programmers understand that writing C takes care and
effort. Most C programmers understand that their work is easier if they
get reasonable tools and learn how to use them properly.

You may also see that I sometimes discuss things I would prefer were
different in C, or that I think would make useful additions to the
language. But I make these either in the context of clear wishful
thinking that I know will never happen, or as ideas that are at least
somewhat realistic because they can be found as extensions in existing
compilers, or are in C++.

Ultimately, I am a practical programmer. I make a living writing
embedded code in C (plus some C++, plus Python PC programming, etc.).
My main interest in C groups is to get the best out of the language and
the tools, and be sure that my code is correct. To a very large extent,
I don't really care if my compiler rejects a piece of bad code because
the syntax of C described in the standards disallows it, or if it merely
triggers a warning in my compiler. In either case, I have avoided a
mistake.

So I find repeated discussions about how "C lets you write this bad
code" as tedious and pointless. C lets you write /good/ code, and C
tools help you write /good/ code and avoid /bad/ code. What more do you
want? Do you want to turn back time and change C forty years ago? Do
you need someone to hold your hand and explain to you /again/ how arrays
work every time you use one? If you can't cope with writing C
/correctly/, then pick a different language or a different career.


(As for Python - it is a very different language from C. It has its
strengths and weaknesses, and is useful for different tasks than C
programming. It is also off-topic here, since we are discussing C - so
while I will refer to Python for contrast or comparison with C, I am not
going into detail about what I like or dislike about it.)
Post by BartC
(Although that doesn't explain why there is a new version of Python
every five minutes.)
Post by David Brown
What's next after "swap" ? Do you want to add an operator that
calculates "4*a + b" just because that fits an x86 addressing mode, and
people could then use it to do efficient multiplication by 5 even on
Tiny C? Personally, I'd rather continue to write "x * 5" and let a good
compiler generate the efficient code on processors that have such
instructions.
You're being silly now.
You are the one who wants to add new features (operators or built-in
functions) to take advantage of processor instructions that exist on
some cpus - just so that limited compilers can generate better code.

And I suspect the number of times I would have use of a "swap" operator
in my programming is not that much more than the number of times I have
to multiply by 5.
Post by BartC
Doesn't C already have ++a and a+=b and P->m and
++a a=a+1
a+=b a=a+b
P->m (*P).m
a?b:c if (a) temp=b; else temp=c; ... temp ...
a && b !!a & !!b
a && b if (a) if (b) ...
switch(x) if (x==...)...
There are a number of constructs with both add expressibility and also
tell the compiler exactly what it is that is being done rather than the
compiler having to deduce it.
C is not designed to be an absolute minimal language - merely a
relatively small and stable language.
Post by BartC
swap
min and max
n-way select (a?b:c is 2-way select)
n-times loops
a to b loops
non-int/non-const switch (here, where C has an if/else-if chain
comparing the same expr against a number of values; see below)
bit extraction
2-way/n-way select as lvalues
chained compares (Python-style a==b==c etc)
I know that you have a thing about gcc and its wonderful optimiser,
if (x[i+1]=a) {...}
else if (x[i+1]=f(i)) {...}
else if (x[i+1]....
So you write 'x[i+1]' N times and gcc magically generates code that only
evaluates it once (even being clever enough to know that, if i is global
and f() is external, f() won't change i for the third compare).
But hang on ... wouldn't it be better if you only wrote 'x[i+1]' *once*?
And that way the value is frozen so calls like f() can't effect
subsequent comparisons.
I guess you know better...
Yes, I know better:

int x_i = x[i+1];
if (x_i == a) {
...
} else if (x_i == f(i)) {
...
} else if (x_i == ...

Those of us who understand C99, local variables and block scope, and who
use compilers with at least basic optimisation, are not afraid to make
new local variables as needed in order to get the code we want.
Post by BartC
Post by David Brown
And the fact that it is /not/ common on newer processors and in
particular, it is not common on RISC designs, suggests that as an
operation it is not useful enough to be worth the cost of including it
in the processor. (Swap instructions were cheap to implement in single
cpu, single bus master, in-order processors. They are /very/ expensive
to implement in modern multi-core out-of-order processors with
load-store architectures.)
That's true; XCHG on x64 is too slow to be useful. But it doesn't change
the fact that it was /desirable/.
The key desirable feature of having a low-level "exchange" operation is
to make it atomic. C11 gives you atomic_exchange() for that purpose.

Other than that, it is not /that/ hard to manually write out a "swap"
operation when you need it - or write your own macro or function if you
need it a lot. And it is not /that/ hard for a compiler to see such
patterns and generate optimal code for the target.

I don't disagree that it might sometimes be convenient to have a "swap"
operation built into the language. I just think it is not nearly useful
enough to be worth the effort of adding it to the language.
Post by BartC
Post by David Brown
Post by BartC
That reminds me of a feature I had on an old language which was along
stack a, b
unstack a, b
stack/unstack map to push and pop instructions. There are no type checks
so this could do type-punning (stack a float, unstack to an int).
If you want to program in Forth, that's fine - but it's a very different
language from C.
This wasn't Forth, but an early version of my C-like language. But
what's wrong with it? It ticks all the boxes. It can do 'a,b=b,a' and a
lot more (via a macro such as ASSIGN2(a,b, b,c) if required). It doesn't
need to use the hardware stack.
I didn't say there was anything wrong with it - merely that it is a
different language, and from that rather small snippet and description
it looks more like Forth than C.
BartC
2016-12-22 15:11:04 UTC
Permalink
Post by David Brown
Post by BartC
if (x[i+1]=a) {...}
else if (x[i+1]=f(i)) {...}
else if (x[i+1]....
So you write 'x[i+1]' N times and gcc magically generates code that only
evaluates it once (even being clever enough to know that, if i is global
and f() is external, f() won't change i for the third compare).
But hang on ... wouldn't it be better if you only wrote 'x[i+1]' *once*?
And that way the value is frozen so calls like f() can't effect
subsequent comparisons.
I guess you know better...
int x_i = x[i+1];
if (x_i == a) {
...
} else if (x_i == f(i)) {
...
} else if (x_i == ...
Those of us who understand C99, local variables and block scope, and who
use compilers with at least basic optimisation, are not afraid to make
new local variables as needed in order to get the code we want.
So you're doing the compiler's job for it instead of getting on with
yours. And not doing it properly because x_i still occurs multiple times
as does "==". With the increased possibility if errors if you misspell
x_i or some other typo turns it into another variable.

(I might write such a construct in another syntax as:

case x[i+1]
when a then ...
when f(i) then ...
when b, c then ...

x[i+1] appears just once. "==" is implied in all tests. The last would
need to be written as 'else if (x_i==b || x_i==c)'. I had to think a
second to make sure it was || not &&!)
Post by David Brown
I don't disagree that it might sometimes be convenient to have a "swap"
operation built into the language. I just think it is not nearly useful
enough to be worth the effort of adding it to the language.
My implementation of it for primitive types is under 50 lines of code.

I've probably spent more effort arguing about it in this thread!
--
Bartc
David Brown
2016-12-23 11:01:57 UTC
Permalink
Post by BartC
Post by David Brown
Post by BartC
if (x[i+1]=a) {...}
else if (x[i+1]=f(i)) {...}
else if (x[i+1]....
So you write 'x[i+1]' N times and gcc magically generates code that only
evaluates it once (even being clever enough to know that, if i is global
and f() is external, f() won't change i for the third compare).
But hang on ... wouldn't it be better if you only wrote 'x[i+1]' *once*?
And that way the value is frozen so calls like f() can't effect
subsequent comparisons.
I guess you know better...
int x_i = x[i+1];
if (x_i == a) {
...
} else if (x_i == f(i)) {
...
} else if (x_i == ...
Those of us who understand C99, local variables and block scope, and who
use compilers with at least basic optimisation, are not afraid to make
new local variables as needed in order to get the code we want.
So you're doing the compiler's job for it instead of getting on with
yours.
No, I am doing the programmer's job. You wanted a series of comparisons
to x[i+1] but you did not want to evaluate x[i+1] more than once. So
the code above does that - problem solved.
Post by BartC
And not doing it properly because x_i still occurs multiple times
as does "==". With the increased possibility if errors if you misspell
x_i or some other typo turns it into another variable.
Really? You see misspelling "x_i" as a serious risk? Call it "i"
instead - it's unlikely you'll misspell that. Or pick a more
descriptive name for the purpose, or get a decent editor.
Post by BartC
case x[i+1]
when a then ...
when f(i) then ...
when b, c then ...
That's a different matter - you want a "switch" that has more flexible
cases. Such a feature might be nice, I agree. The origin of C's switch
is a sort of computed goto, so the switch cases are basically just
labels and therefore had to be constants.

In a structure like the one above, I would be concerned about exactly
when f(i) would be evaluated. Should it always be evaluated? Or only
if the match to "a" fails? What happens if f(i) happens to be the same
value as "b" or "a" ?

It looks like you mean this to be a compact form for a series of if/else
statements, which is /not/ the same thing as a switch.

What I would like, and think is not unreasonable, is for switches to
allow more general compile-time constant values for the cases, as C++
does. And gcc's case ranges (like "case 1 ... 10 :" ) are nice too.
Post by BartC
x[i+1] appears just once. "==" is implied in all tests. The last would
need to be written as 'else if (x_i==b || x_i==c)'. I had to think a
second to make sure it was || not &&!)
Post by David Brown
I don't disagree that it might sometimes be convenient to have a "swap"
operation built into the language. I just think it is not nearly useful
enough to be worth the effort of adding it to the language.
My implementation of it for primitive types is under 50 lines of code.
There is a great deal more to adding something like "swap" to C than
just the code implementation in the compiler. That is the difference
between a personal hobby project and a serious standardised language
with specifications, documentations, large numbers of implementations,
vast numbers of programmers, and a huge base of existing code.
Post by BartC
I've probably spent more effort arguing about it in this thread!
BartC
2016-12-23 17:58:08 UTC
Permalink
Post by David Brown
Post by BartC
So you're doing the compiler's job for it instead of getting on with
yours.
No, I am doing the programmer's job. You wanted a series of comparisons
to x[i+1] but you did not want to evaluate x[i+1] more than once. So
the code above does that - problem solved.
Post by BartC
And not doing it properly because x_i still occurs multiple times
as does "==". With the increased possibility if errors if you misspell
x_i or some other typo turns it into another variable.
Really? You see misspelling "x_i" as a serious risk? Call it "i"
instead - it's unlikely you'll misspell that. Or pick a more
descriptive name for the purpose, or get a decent editor.
Yes; as soon as you have to repeat something, there is going to be an
increased risk of typos. Some will be picked up, some not.

Like here, where C requires loop variables to be written three times:

for (i=0; i<M; ++i)
for (j=0; j<N; ++i)

Where I normally duplicate the first line then change the bits that need
to be different. But sometimes they can be left out. I don't know if an
editor will help with that.
Post by David Brown
That's a different matter - you want a "switch" that has more flexible
cases. Such a feature might be nice, I agree. The origin of C's switch
is a sort of computed goto, so the switch cases are basically just
labels and therefore had to be constants.
In a structure like the one above, I would be concerned about exactly
when f(i) would be evaluated. Should it always be evaluated? Or only
if the match to "a" fails? What happens if f(i) happens to be the same
value as "b" or "a" ?
It looks like you mean this to be a compact form for a series of if/else
statements, which is /not/ the same thing as a switch.
That was vaguely based on Ada's 'case' statement I think. And yes the
semantics are a little different from C's switch: test expressions are
evaluated in order until one is true; and test expressions can be
duplicated although only the first will match. (However two identical
runtime expressions can give different results.)
Post by David Brown
What I would like, and think is not unreasonable, is for switches to
allow more general compile-time constant values for the cases, as C++
does. And gcc's case ranges (like "case 1 ... 10 :" ) are nice too.
That's another of those 'no-brainers' like binary literals and numeric
separators. The main argument against case x...y; when it has come up
before is that it would be too tempting to write "case 'A'...'Z':". Well
that's exactly want we want it for! Nobody cares about EBCDIC.
Post by David Brown
Post by BartC
My implementation of it for primitive types is under 50 lines of code.
There is a great deal more to adding something like "swap" to C than
just the code implementation in the compiler. That is the difference
between a personal hobby project and a serious standardised language
with specifications, documentations, large numbers of implementations,
vast numbers of programmers, and a huge base of existing code.
My projects in the past have used for commercial products where there
other coders were involved and there were numbers of installations and
existing user-programs which had to still work after a change in any of
the languages involved.

Not on any great scale but enough to appreciate some of the problems.
None of it changes the fact that the implementation of a built-in swap
can be reasonably straightfoward (but not trivial when lvalues and type
matching are involved).
--
Bartc
Keith Thompson
2016-12-23 19:52:00 UTC
Permalink
[...]
Post by BartC
Post by David Brown
It looks like you mean this to be a compact form for a series of if/else
statements, which is /not/ the same thing as a switch.
That was vaguely based on Ada's 'case' statement I think. And yes the
semantics are a little different from C's switch: test expressions are
evaluated in order until one is true; and test expressions can be
duplicated although only the first will match. (However two identical
runtime expressions can give different results.)
Very loosely, I think. Ada's case statement, like C's, requires test
expressions to be compile-time constants and does not permit values
to be duplicated. It does permit ranges. (It also doesn't support
fallthrough, and it requires all possible values to be covered.)
Post by BartC
Post by David Brown
What I would like, and think is not unreasonable, is for switches to
allow more general compile-time constant values for the cases, as C++
does. And gcc's case ranges (like "case 1 ... 10 :" ) are nice too.
That's another of those 'no-brainers' like binary literals and numeric
separators. The main argument against case x...y; when it has come up
before is that it would be too tempting to write "case 'A'...'Z':". Well
that's exactly want we want it for! Nobody cares about EBCDIC.
Nobody? Have you verified that by asking everybody? I understand that
*you* don't care about EBCDIC, and that's fine.

EBCDIC isn't the only argument against `case 'A'...'Z':`. Those are not
the only uppercase letters; in fact Unicode currently has 1327 of them.
The isupper() function is sensitive to the current locale.

Having said all that, I wouldn't actually object to allowing ranges in
case labels. It's just as easy to screw up by writing
if (c >= 'A' && c <= 'Z')
or by calling isupper() without first calling setlocale() -- and perhaps
sometimes you *want* just the 26 Latin upper case letters.

But like any change to the standard, it needs to carry substantial
benefit to justify adding it.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2016-12-23 23:03:04 UTC
Permalink
Post by Keith Thompson
Nobody? Have you verified that by asking everybody? I understand that
*you* don't care about EBCDIC, and that's fine.
EBCDIC isn't the only argument against `case 'A'...'Z':`. Those are not
the only uppercase letters; in fact Unicode currently has 1327 of them.
The isupper() function is sensitive to the current locale.
A significant fraction of code that works with letters is designed for
processing "machine-readable" material which will consist entirely of
ASCII characters; code which wants to e.g. normalize HTML tags to lowercase
should convert "IFRAME" to "iframe" even in a Turkish locale. As for
EBCDIC, data exists in EBCDIC format and there is a need for code and
equipment to process such data, but I don't see much need to port code
which is written for other machines to use EBCDIC; it would seem far more
sensible to focus on migrating code and data from EBCDIC systems to ASCII
systems.
Keith Thompson
2016-12-23 23:26:01 UTC
Permalink
Post by s***@casperkitty.com
Post by Keith Thompson
Nobody? Have you verified that by asking everybody? I understand that
*you* don't care about EBCDIC, and that's fine.
EBCDIC isn't the only argument against `case 'A'...'Z':`. Those are not
the only uppercase letters; in fact Unicode currently has 1327 of them.
The isupper() function is sensitive to the current locale.
A significant fraction of code that works with letters is designed for
processing "machine-readable" material which will consist entirely of
ASCII characters; code which wants to e.g. normalize HTML tags to lowercase
should convert "IFRAME" to "iframe" even in a Turkish locale. As for
EBCDIC, data exists in EBCDIC format and there is a need for code and
equipment to process such data, but I don't see much need to port code
which is written for other machines to use EBCDIC; it would seem far more
sensible to focus on migrating code and data from EBCDIC systems to ASCII
systems.
You snipped the part of my message in which I acknowledged that:

perhaps sometimes you *want* just the 26 Latin upper case letters.

You provided an example supporting a point that I made, but managed to
make it look like you were refuting what I wrote. Please don't do that.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Thiago Adams
2016-12-23 20:52:05 UTC
Permalink
Post by David Brown
[..]
And I suspect the number of times I would have use of a "swap" operator
in my programming is not that much more than the number of times I have
to multiply by 5.
Swap is very common for me.
I started to use swap in C++. Swap has an important property that it doesn't throw/fail. C++ community has a name for functions with no side effects in the operands. They say "this function have strong guarantee".
Swap is the best way to implement the function with strong guarantee because you can use temporaries and them "commit" results in a safe way to operands using swap.

I am using swap in C in the same way, but in I also find another reason.

void GetName(String* name)
{
String temp = STRING_INIT;
String_Append(&temp, "something")

swap(temp, name);

String_Destroy(&temp);
}

Using swap I don't need to care about the previous state of name. (it must be initialized is the only requisite)
I can change it's state completely without to write a new function like "String_Set". Let's say I write

*name = temp;

This would be dangerous and uncommon for me.
So swap is more important then assignment. And the assignment is used mainly on initialization in my code.

So, having an concept that is more common than assignment in my code, I would like to have a simple way to use it.
Swap is something so basic, that operator assignment in C++ in implemented in terms of swap.
s***@casperkitty.com
2016-12-23 22:51:15 UTC
Permalink
Post by Thiago Adams
Using swap I don't need to care about the previous state of name. (it must be initialized is the only requisite)
Calling "destroy" upon the previous name would fail if it hadn't been
initialized, but sometimes it can be helpful to have operations--both
copies and exchanges--which are agnostic to whether things have been
initialized. While some platforms' "normal" means of copying certain
types might trap when copying uninitialized values, it's useful to be
able to snapshot or permute arrays which contain a mixture of initialized
and unused items and there's no good reason that should be difficult.
Keith Thompson
2016-12-22 17:12:15 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
I can see I'm never going to convince you of anything! So C, is perfect
as it is; Python is perfect as it is.
I don't believe anyone here has ever said that C is perfect.

I suggest you go find some people who actually do think C is perfect,
and argue with them.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2016-12-22 16:54:39 UTC
Permalink
[...]
Post by David Brown
Post by BartC
The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
This is /programming/ - it is not ditch-digging. Yes, you need to take
care with what you write and how you write it - the same applies to all
programming, in all programming languages.
[...]

I know what you're saying, but ditch-digging is also a skill. (If you
need a ditch, don't ask me to dig it for you.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Brown
2016-12-23 11:05:59 UTC
Permalink
Post by Keith Thompson
[...]
Post by David Brown
Post by BartC
The standard way of swapping terms x and y in Python requires writing
each term twice. That needs extra care. And yes it can be a little less
efficient.
This is /programming/ - it is not ditch-digging. Yes, you need to take
care with what you write and how you write it - the same applies to all
programming, in all programming languages.
[...]
I know what you're saying, but ditch-digging is also a skill. (If you
need a ditch, don't ask me to dig it for you.)
Yes, I know - it's just the traditional example.

I tried digging a ditch in our garden (for burying electric cables
between the house and the garage). After several weekends worth of
blisters, sore backs, and minimal progress, we hired professionals - it
took 15 minutes with a mini digger.

Having the right tools, and knowing how to use them properly, is
important in /any/ job.
s***@casperkitty.com
2016-12-22 20:14:52 UTC
Permalink
Post by David Brown
Post by BartC
There's a wonderfully fast C compiler called Tiny C, but it generates
poor code. Wouldn't it be great if new language features simply involved
added a few dozen lines of straightforward code and you'd immediately
get the full benefit? You wouldn't need to use -O3. Instead of relying
on the vagaries of different compilers' optimisers which may or may not
be able to recognise the new idiom to get the best code.
Are you seriously suggesting that we could add enough new features,
operators, and built-in functions to C and then small and simple
compilers would magically generate as efficient code as large and
complex ones?
When targeting older architectures where the performance of a piece of
code would not generally be sensitive to what code had run before or
after it, even simple compilers could generate code which was decently
efficient if programmers helped them out.

If a compiler can automate some of the code substitutions needed to achieve
optimal performance, that may give programmers more time to make other sorts
of improvement. On the other hand, if a compiler's efforts a "optimization"
make it impossible for programmers to implement optimizations of their own
that would have been even more beneficial, such behavior would run counter
to the purpose of generating efficient code.
Thiago Adams
2016-12-20 16:57:29 UTC
Permalink
Post by Kaz Kylheku
Post by Thiago Adams
Post by Kaz Kylheku
Post by Thiago Adams
swap-expression
_swap ( unary-expression, unary-expression)
It's pretty stupid to add this kind of thing as a primitive ("special
form", in Lisp terms) rather than providing a way for it to be written
in the language.
Remember that one of the selling points of C over Pascal and its ilk is
that the library, including I/O and formatted printing, could all be
written in C. Not like that silly "writeln" cruft that has to be wired
into the parser, and has its own syntax that cannot be imitated by
a programmer-written wrapper.
There is already a descendant dialect of C in which you can write swap,
namely C++.
You don't have to, because std::swap is provided.
If developing C is beating a dead horse, adding a swap operator is like
grafting a horn onto a dead horse's forehead to make a unicorn.
To use swap in C++ you need to include <algorithm>. Types that provide
swap needs to add a member function swap and a non-member function
swap, so the global swap can be used using name lookup rules. If we
consider swap something fundamental and basic then it also could be
added to C++ as operator.
A less sophisticted swap can be obtained as a simple template
template <typename T> void swap(T &x, T &y)
{ T temp = x; x = y; y = temp; }
You need swap-member-function in C++ to access private data and make efficient swap for types like vector. Swap is not made automatic like operator = and copy constructor.
The non member swap is required because generic algorithms would not understand if they have to call member-swap or non-member swap.
So it's recommended to provide Both. To avoid this problem some C++ programmers want what they call Unified Call Syntax.
BartC
2016-12-20 13:56:36 UTC
Permalink
Post by Kaz Kylheku
Post by Thiago Adams
swap-expression
_swap ( unary-expression, unary-expression)
It's pretty stupid to add this kind of thing as a primitive ("special
form", in Lisp terms) rather than providing a way for it to be written
in the language.
I don't agree, because you can make the same argument for any number of
language features, to end up with a minimalist language where you can
(and have to) build up everything you need from primitive elements. But
you can't usually add syntax.
Post by Kaz Kylheku
Remember that one of the selling points of C over Pascal and its ilk is
that the library, including I/O and formatted printing, could all be
written in C.
It doesn't need a selling point any more. C might as well be holding a
gun to everyone's head saying 'use me', as there is little other choice.
Because if developing programs at low level, you can't do much without
having to deal with C or C interfaces at some point.

So if many are obliged to use C, why shouldn't someone want extra
features to make their life a bit easier? (Of course, C is never going
to be developed that way.)
Post by Kaz Kylheku
Not like that silly "writeln" cruft that has to be wired
into the parser, and has its own syntax that cannot be imitated by
a programmer-written wrapper.
Yes, that's the point of having dedicated syntax. To give some life,
some shape, to a language so that everything doesn't just like look a
big mass of function calls. Or smothered in parentheses.

(One or two languages such as Seed7 apparently allow user-defined
syntax. However I've never seen any Seed7 code that looks like anything
other than Seed7. Or maybe I have but didn't recognise it!)

Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').

If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?

Then I wouldn't have to keep changing "%d" to "%ld" to "%llu" and back
again as variables' types change; or I could print X without having to
first investigate what type X might happen to be.
Post by Kaz Kylheku
There is already a descendant dialect of C in which you can write swap,
namely C++.
You don't have to, because std::swap is provided.
Which suggests that some people considered it useful!
--
Bartc
Keith Thompson
2016-12-20 16:41:53 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').
If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?
[...]

Because the same mechanism that's used to implement printf in C code
can be used to implement other user-defined functions. See printk,
for example (an internal Linux kernel function), and other variadic
functions like execl().

I'm not a big fan of the way printf works, but it does work.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-20 18:10:09 UTC
Permalink
Post by Keith Thompson
[...]
Post by BartC
Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').
If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?
[...]
Because the same mechanism that's used to implement printf in C code
can be used to implement other user-defined functions. See printk,
for example (an internal Linux kernel function), and other variadic
functions like execl().
I'm not a big fan of the way printf works, but it does work.
Well it is workable (and I often make use of sprintf from outside the
language). But the benefit of having it built-in is that the compiler
knows the types so can automatically overload the conversion operators
(choosing one of %d, %lld, %f, %s for example).

User-code then just needs to control the display (width, base etc) if it
needs to.
--
Bartc
David Brown
2016-12-21 12:48:06 UTC
Permalink
Post by BartC
Post by Keith Thompson
[...]
Post by BartC
Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').
If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?
[...]
Because the same mechanism that's used to implement printf in C code
can be used to implement other user-defined functions. See printk,
for example (an internal Linux kernel function), and other variadic
functions like execl().
I'm not a big fan of the way printf works, but it does work.
Well it is workable (and I often make use of sprintf from outside the
language). But the benefit of having it built-in is that the compiler
knows the types so can automatically overload the conversion operators
(choosing one of %d, %lld, %f, %s for example).
User-code then just needs to control the display (width, base etc) if it
needs to.
Since it is a standard library function, a compiler can know how it is
implemented and then at least warn about mismatches between format
specifiers and parameter types. But it can't automatic conversions -
sometimes the format string is not a compile-time constant, and it needs
to be consistent whether the format string is known at compile time or
only at run time.
s***@casperkitty.com
2016-12-22 18:37:27 UTC
Permalink
Post by BartC
Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').
The difficulty is that variadic functions weren't "bolted on"; rather, the
way in which early C implementations passed arguments on the stack meant
that *all* functions were *naturally* variadic. If a language included
explicit support for variadic functions, it could have very cheaply
supported an intrinsic which would report what the next argument type was,
along with intrinsics to extract the next int, double, data pointer,
function pointer, etc.) it could offer type safety at very low cost. In
fact, for some implementations such an approach might actually take less
code than the present approach. If f1 and f2 are local variables of type
"float", code for printf("%5.2f %5.2f",f1,f2); would need to have the caller
convert each variable to type "double" before passing it. A mechanism
that was designed for such variadic functions, however, could instead
simply pass a pointer to a structure with information about the arguments,
allowing all such conversions to be done in one library function, rather
than at every printf call site.
Post by BartC
If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?
A type-safe mechanism for variadic arguments could have been both safer and
more size-efficient than the existing approach. Ensuring inter-operability
between functions generated by different compilers for the same platform
might have been sightly tricky, but not insurmountable.
Jakob Bohm
2016-12-27 16:41:41 UTC
Permalink
Post by s***@casperkitty.com
Post by BartC
Regarding 'writeln' over 'printf': look at printf; just look at it! If
that's the best that i/o written with user-code functions can do, then
it stinks (and to do even that, they had to bolt on 'variadic functions').
The difficulty is that variadic functions weren't "bolted on"; rather, the
way in which early C implementations passed arguments on the stack meant
that *all* functions were *naturally* variadic. If a language included
explicit support for variadic functions, it could have very cheaply
supported an intrinsic which would report what the next argument type was,
along with intrinsics to extract the next int, double, data pointer,
function pointer, etc.) it could offer type safety at very low cost. In
fact, for some implementations such an approach might actually take less
code than the present approach. If f1 and f2 are local variables of type
"float", code for printf("%5.2f %5.2f",f1,f2); would need to have the caller
convert each variable to type "double" before passing it. A mechanism
that was designed for such variadic functions, however, could instead
simply pass a pointer to a structure with information about the arguments,
allowing all such conversions to be done in one library function, rather
than at every printf call site.
Post by BartC
If 'printf' could have been made much better if it was a built-in
language feature, even if it still used function syntax, then why not?
A type-safe mechanism for variadic arguments could have been both safer and
more size-efficient than the existing approach. Ensuring inter-operability
between functions generated by different compilers for the same platform
might have been sightly tricky, but not insurmountable.
It would only be more size efficient where a full set of built in types
can occur in any order and a separate argument holds an encoding of the
specific types passed on each invocation (as is the case for printf and
scanf).

Where the possibilities are much more constrained by the function
design, automatically passing around type designators is pure unused
overhead. I have written and used variadic functions where all the
varying arguments were of (almost) the same type or followed some
semantic pattern (such as char*, int, int, char*, int, int, ...).

It should also be noted that for the printf/writeln task, splitting the
operation up into a type-dependent call per argument, with constant
string arguments being just another type to output is a common way to
efficiently implement this in Pascal, and how this is actually done for
C++ "streams" (where all those functions are named "operator <<").

But lets get back to the swap discussion and leave variadics for
another thread.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
s***@casperkitty.com
2016-12-27 17:27:54 UTC
Permalink
Post by Jakob Bohm
Post by s***@casperkitty.com
A type-safe mechanism for variadic arguments could have been both safer and
more size-efficient than the existing approach. Ensuring inter-operability
between functions generated by different compilers for the same platform
might have been sightly tricky, but not insurmountable.
It would only be more size efficient where a full set of built in types
can occur in any order and a separate argument holds an encoding of the
specific types passed on each invocation (as is the case for printf and
scanf).
On many platforms it could generally be more *space*-efficient, at some
expense in execution time, because it is extremely common for functions
to be passed things of a few discrete forms, e.g.

constant
value at global symbol
value at constant displacement from global symbol
value at constant displacement from function's initial stack pointer

On a typical ARM like the Cortex-m0, passing a global variable to printf
will take 4-6 bytes of code, in addition to a 4-byte literal for the
variable's address (4 bytes, word-aligned) which will have to be store
in code space near the printf call, and the 2+ bytes for the format
specifier. So 10-12+ bytes for each such variable printed. Many ways of
encoding argument descriptors could be more concise than that.
Jakob Bohm
2016-12-29 16:39:39 UTC
Permalink
Post by s***@casperkitty.com
Post by Jakob Bohm
Post by s***@casperkitty.com
A type-safe mechanism for variadic arguments could have been both safer and
more size-efficient than the existing approach. Ensuring inter-operability
between functions generated by different compilers for the same platform
might have been sightly tricky, but not insurmountable.
It would only be more size efficient where a full set of built in types
can occur in any order and a separate argument holds an encoding of the
specific types passed on each invocation (as is the case for printf and
scanf).
On many platforms it could generally be more *space*-efficient, at some
expense in execution time, because it is extremely common for functions
to be passed things of a few discrete forms, e.g.
constant
value at global symbol
value at constant displacement from global symbol
value at constant displacement from function's initial stack pointer
On a typical ARM like the Cortex-m0, passing a global variable to printf
will take 4-6 bytes of code, in addition to a 4-byte literal for the
variable's address (4 bytes, word-aligned) which will have to be store
in code space near the printf call, and the 2+ bytes for the format
specifier. So 10-12+ bytes for each such variable printed. Many ways of
encoding argument descriptors could be more concise than that.
Not sure what you are trying to say here.

For printf(), most arguments are values, not variable addresses, thus
there should (not even on ARMv7m) be no difference in cost between
reading those values and passing them to printf() or other variadic
functions.

For any variadic function, it seems that you are suggesting that using
extra stack/argument space to pass type information would somehow
reduce the amount of code space to load a variable value before passing
it.

It is a generally recognized principle that a function should not know
or care where the caller obtained the argument values. This would be
the same regardless if the function is variadic or not.

The System-V derived ELF-style GOT table is an abomination that adds
crazy code to every global variable access based on a misguided attempt
to make dynamic libraries (.so files) simulate the potential failure
modes of static libraries (.a files). Not that ELF discussion belongs
in the C newsgroup, but I suspect that is what your observed bad code
generation is from.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
s***@casperkitty.com
2016-12-29 17:16:36 UTC
Permalink
Post by Jakob Bohm
Post by s***@casperkitty.com
On many platforms it could generally be more *space*-efficient, at some
expense in execution time, because it is extremely common for functions
to be passed things of a few discrete forms, e.g.
constant
value at global symbol
value at constant displacement from global symbol
value at constant displacement from function's initial stack pointer
Not sure what you are trying to say here.
For printf(), most arguments are values, not variable addresses, thus
there should (not even on ARMv7m) be no difference in cost between
reading those values and passing them to printf() or other variadic
functions.
If the object to be printed is in one of the above forms and the arguments-
descriptor object describes it fully, *there would be no need for the
calling code to pass the object at all*. If the argument descriptor
includes a length (encoded using a variable-length coding) and then a
packed list of object descriptors, and then a list of global addresses
used by the objects described in the list, then passing a global variable
of type "float" as an argument would likely cause a byte or two to be
added to the list of descriptors saying "next parameter is a float, stored
at the next address in the address list". No need to have the caller fetch
the value, convert it to double, and put it on the stack; the library
function to convert the next argument to double would take care of all that.
Post by Jakob Bohm
For any variadic function, it seems that you are suggesting that using
extra stack/argument space to pass type information would somehow
reduce the amount of code space to load a variable value before passing
it.
It would in many cases eliminate the need for the calling code to load the
variable at all.
Post by Jakob Bohm
It is a generally recognized principle that a function should not know
or care where the caller obtained the argument values. This would be
the same regardless if the function is variadic or not.
Code which uses library functions to retrieve arguments wouldn't need to
know or care where those functions got the values from. The library
functions themselves would need to know, but the compiler would ensure
that they got whatever information they needed.
Post by Jakob Bohm
The System-V derived ELF-style GOT table is an abomination that adds
crazy code to every global variable access based on a misguided attempt
to make dynamic libraries (.so files) simulate the potential failure
modes of static libraries (.a files). Not that ELF discussion belongs
in the C newsgroup, but I suspect that is what your observed bad code
generation is from.
Interpreted code is often smaller than machine code. Since most code that
uses variadic functions isn't speed critical, the mechanism I'm suggesting
would effectively cause a lot of the argument-preparation logic from machine
code into something analogous to interpreted code. Rather than generating
machine code for "get the address of some float into a register, load the
float at that address, convert it to double, and place it on the stack",
a compiler would generate some data which would be interpreted by the va-arg
fetch library function as a request to do the same thing.

Keith Thompson
2016-12-20 16:54:12 UTC
Permalink
Thiago Adams <***@gmail.com> writes:
[...]
Post by Thiago Adams
My first thought was to add in the grammar at the same position of sizeof/cast.
1) If swap was operator <>
conditional-expression
unary-expression assignment-operator assignment-expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
<> swap operator <-here
And the requirement that both operands must be values, and that the
types must match in some to-be-determined way, would be constraints.

I wouldn't use "<>"; it could be confused (by beginners) with the Pascal
inequality operator. "<=>" visually suggests bidirectional assignment.
"<->" is another possibility.
Post by Thiago Adams
2) if swap was a keyword
conditional-expression
unary-expression assignment-operator assignment-expression
swap-expression <-here
swap-expression
_swap ( unary-expression, unary-expression)
It would have to be _Swap, not _swap. But if it's a keyword, there's no
reason at all that it can't be an operator:

swap-expression:
unary-expression _Swap unary-expression
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson
2016-12-20 17:20:28 UTC
Permalink
Keith Thompson <kst-***@mib.org> writes:
[...]
Post by Keith Thompson
And the requirement that both operands must be values, and that the
types must match in some to-be-determined way, would be constraints.
Typo: I mean "lvalues", not "values".
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-20 18:18:40 UTC
Permalink
Post by Keith Thompson
[...]
Post by Thiago Adams
My first thought was to add in the grammar at the same position of sizeof/cast.
1) If swap was operator <>
conditional-expression
unary-expression assignment-operator assignment-expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
<> swap operator <-here
And the requirement that both operands must be values, and that the
types must match in some to-be-determined way, would be constraints.
I wouldn't use "<>"; it could be confused (by beginners) with the Pascal
inequality operator. "<=>" visually suggests bidirectional assignment.
"<->" is another possibility.
In another language, I used :=: for while, in a syntax where assignment
was :=

But it was slightly cryptic, and I found it easy to just type 'swap',
written as 'x swap y' or 'swap(x,y)' (it's a keyword, not an operator or
functions). I now exclusively use swap(x,y).

I don't think much of _Swap(x,y) although I suppose people can always
wrap a macro 'swap' around it.
--
Bartc
Keith Thompson
2016-12-20 20:05:11 UTC
Permalink
BartC <***@freeuk.com> writes:
[...]
Post by BartC
I don't think much of _Swap(x,y) although I suppose people can always
wrap a macro 'swap' around it.
Adding "swap" as a keyword is not an option. It would break
existing code.

The convention since C99 has been for (most) new keywords to start
with an underscore and an uppercase letter, often with an optional
macro for the more common form. For example, _Bool was added as
a keyword, and <stdbool.h> adds `#define bool _Bool`.

*If* swap were to be added as a language feature, I think I'd favor
an assignment-like operator, perhaps spelled "<=>". It avoids the
complications that result when a new keyword is added, as well as
any confusion with function calls, and the syntax is reasonably
intuitive.

(One potential drawback of "<=>" is that it's an operator in Perl,
that yields -1, 0, or 1 if its left operand is less than, equal to,
or greater than its right operand. It's conceivable, but unlikely,
that a future C standard might adopt that.)

Another possibility might be multiple assignment, as in Python:
x, y = y, x;
which also permits more complicated things like:
x, y, z = y, z, x;

but that would conflict with the existing comma operator. Perhaps some
fairly lightweight syntax could be added to disambiguate it.

<JOKE>I know, we can use "static"!</JOKE>
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
b***@gmail.com
2016-12-20 21:13:55 UTC
Permalink
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that complex terms might be evaluated twice, and of course they have to be written twice.

C also has a problem with wrapping such an operation in a function (that would be hopefully inlined), as it would need calling as swap(&x, &y). Plus you'd need to arrange for different swap functions for different operand sizes.
Post by Keith Thompson
x, y, z = y, z, x;
That's an interesting idea. Except that while I've used swap, I've rarely used such a rotation.
--
bartc
Keith Thompson
2016-12-20 21:40:35 UTC
Permalink
Post by b***@gmail.com
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that
complex terms might be evaluated twice, and of course they have to be
written twice.
How often do the operands need to be complex terms?
Post by b***@gmail.com
C also has a problem with wrapping such an operation in a function
(that would be hopefully inlined), as it would need calling as
swap(&x, &y). Plus you'd need to arrange for different swap functions
for different operand sizes.
The same applies to assignment.
Post by b***@gmail.com
Post by Keith Thompson
x, y, z = y, z, x;
That's an interesting idea. Except that while I've used swap, I've
rarely used such a rotation.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-20 21:48:53 UTC
Permalink
Post by Keith Thompson
Post by b***@gmail.com
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that
complex terms might be evaluated twice, and of course they have to be
written twice.
How often do the operands need to be complex terms?
About as often as they need to be for min() and max().

But if you're not worried about writing terms out twice and possibly
invoking side-effects, what was the point of being able to write:

a += b;

instead of a = a + b; ?
Post by Keith Thompson
Post by b***@gmail.com
C also has a problem with wrapping such an operation in a function
(that would be hopefully inlined), as it would need calling as
swap(&x, &y). Plus you'd need to arrange for different swap functions
for different operand sizes.
The same applies to assignment.
Well, assignment is already an operator so there is no need!
--
Bartc
Keith Thompson
2016-12-20 22:29:03 UTC
Permalink
Post by BartC
Post by Keith Thompson
Post by b***@gmail.com
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that
complex terms might be evaluated twice, and of course they have to be
written twice.
How often do the operands need to be complex terms?
About as often as they need to be for min() and max().
But if you're not worried about writing terms out twice and possibly
a += b;
instead of a = a + b; ?
The "+=" syntax is clear, convenient, and obvious once it's explained.
I can't think of a good way to do something similar for a swap operator.

Say you want to swap an element of one array with an element of another:

arr1[func()] <=> arr2[func()];

func() is going to be called twice. How would you specify that a
subexpression of one operand that happens to be identical to a
subexpression of the other operand should only be evaluated once? I'm
not suggesting it would be difficult to implement, I'm saying I can't
think of a good syntax for it.

Oh, wait, yes I can:

const int index = func();
arr1[index] <=> arr2[index];
Post by BartC
Post by Keith Thompson
Post by b***@gmail.com
C also has a problem with wrapping such an operation in a function
(that would be hopefully inlined), as it would need calling as
swap(&x, &y). Plus you'd need to arrange for different swap functions
for different operand sizes.
The same applies to assignment.
Well, assignment is already an operator so there is no need!
And if swap were an operator there would also be no need. I thought
that was the context we were discussing.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
BartC
2016-12-20 23:03:51 UTC
Permalink
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by b***@gmail.com
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that
complex terms might be evaluated twice, and of course they have to be
written twice.
How often do the operands need to be complex terms?
About as often as they need to be for min() and max().
But if you're not worried about writing terms out twice and possibly
a += b;
instead of a = a + b; ?
The "+=" syntax is clear, convenient, and obvious once it's explained.
I can't think of a good way to do something similar for a swap operator.
I would have thought that 'swap' was self-explanatory!
Post by Keith Thompson
arr1[func()] <=> arr2[func()];
func() is going to be called twice. How would you specify that a
subexpression of one operand that happens to be identical to a
subexpression of the other operand should only be evaluated once?
That's not the problem. Here func() clearly needs to be called twice.
But in your proposed syntax:

arr1[func()], arr2[func()] = arr2[func()], arr1[func()]

it will be called four times. And it really needs that both arr1[] terms
have the same index, and both arr2[]. (That is, is you want to do an
exchange. But this syntax will also do a,b = c,d.)

This I think is a more likely example (and a bit of actual code, from a
list rotate routine funnily enough):

swap(a[first++], a[next++])

But written as:

a[first++], a[next++] = a[next++], a[first++]

it's clear that it won't work.
--
Bartc
Keith Thompson
2016-12-20 23:44:20 UTC
Permalink
Post by BartC
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by b***@gmail.com
Post by Keith Thompson
x, y = y, x;
I already mentioned a problem with that Python approach, which is that
complex terms might be evaluated twice, and of course they have to be
written twice.
How often do the operands need to be complex terms?
About as often as they need to be for min() and max().
But if you're not worried about writing terms out twice and possibly
a += b;
instead of a = a + b; ?
The "+=" syntax is clear, convenient, and obvious once it's explained.
I can't think of a good way to do something similar for a swap operator.
I would have thought that 'swap' was self-explanatory!
I meant that something that is to "swap" as "+=" is to "+" is not
obvious -- but I may have misunderstood the context.
Post by BartC
Post by Keith Thompson
arr1[func()] <=> arr2[func()];
func() is going to be called twice. How would you specify that a
subexpression of one operand that happens to be identical to a
subexpression of the other operand should only be evaluated once?
That's not the problem. Here func() clearly needs to be called twice.
arr1[func()], arr2[func()] = arr2[func()], arr1[func()]
it will be called four times. And it really needs that both arr1[] terms
have the same index, and both arr2[]. (That is, is you want to do an
exchange. But this syntax will also do a,b = c,d.)
This I think is a more likely example (and a bit of actual code, from a
swap(a[first++], a[next++])
a[first++], a[next++] = a[next++], a[first++]
it's clear that it won't work.
Well, it will work if you save the index values in separate variables,
but I see your point.

If C adopted multi-value assignment, it could be used for swapping,
and would be perfectly adequate in most cases, but it wouldn't
avoid evaluating the operands twice in cases where that matters.
(But at least the operands would be explicitly written twice,
so the double evaluation wouldn't be hidden.) A dedicated swap
operator would be less general, but wouldn't have that issue.

I personally don't think that's a very strong argument in favor of one
approach over the other, since it's easy to work around it. But then I
don't think either feature is likely to make it into a new version of C.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Richard Bos
2016-12-27 11:02:20 UTC
Permalink
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by BartC
But if you're not worried about writing terms out twice and possibly
a += b;
instead of a = a + b; ?
The "+=" syntax is clear, convenient, and obvious once it's explained.
I can't think of a good way to do something similar for a swap operator.
I would have thought that 'swap' was self-explanatory!
I meant that something that is to "swap" as "+=" is to "+" is not
obvious -- but I may have misunderstood the context.
Erm... how?

a += b means a = a + b. a gets a new value, b does not.
a <=> b would mean a(new) = b(old), b(new) = a(old). How are you going
to make a reflexive expression of that?

a <=>= b would have to mean a = a <=> b. If we're being picky, that
_should_ invoke UB because it assigns to a twice. But if we fudge that,
I can still see two cases (three, really, and the third is the best):

1. a <=> b evaluates to a(new), i.e. b(old). a(final) becomes a(new),
and so a <=>= b is the same thing as a <=> b.
2. a <=> b evaluates to a(old), i.e. b(new). a(final) becomes a(old),
as does b(new). This means that _nothing happens_ to a, so a <=>= b is
the same thing as b = a.
3. a <=> b doesn't evaluate to anything, and using it as an expression
is a syntax error. a <=>= b doesn't exist. (Or as a variation, a <=> b
does evaluate to either value, as TPTB prefer, but <=>= is recognised as
useless and still doesn't exist.)

Richard
BartC
2016-12-27 12:17:47 UTC
Permalink
Post by Richard Bos
Post by Keith Thompson
Post by BartC
Post by Keith Thompson
Post by BartC
But if you're not worried about writing terms out twice and possibly
a += b;
instead of a = a + b; ?
The "+=" syntax is clear, convenient, and obvious once it's explained.
I can't think of a good way to do something similar for a swap operator.
I would have thought that 'swap' was self-explanatory!
I meant that something that is to "swap" as "+=" is to "+" is not
obvious -- but I may have misunderstood the context.
Erm... how?
a += b means a = a + b.
(Actually it's not even that obvious. In general, if you have:

x += Y; // or *= etc

where Y is a general expression (or is a macro that expands to an
expression), then it's not true that this is always the same as:

x = x + Y;

Example: 'a += b<<c' is different from 'a = a+b<<c'; 'a *= b+c' is
different from 'a = a*b+c'.)

a gets a new value, b does not.
Post by Richard Bos
a <=> b would mean a(new) = b(old), b(new) = a(old). How are you going
to make a reflexive expression of that?
I'm not sure that's what he meant. This is about justifying using
'swap(x,y)' rather than 'x,y=y,x' or any scheme where x and/or y have to
be repeated, in the same way that 'x+=y' is used in place of 'x=x+y'.
--
Bartc
Ben Bacarisse
2016-12-20 19:01:15 UTC
Permalink
Thiago Adams <***@gmail.com> writes:
<snip>
Post by Thiago Adams
1) If swap was operator <>
conditional-expression
unary-expression assignment-operator assignment-expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
<> swap operator <-here
Another, slightly more flexible option is to extend assignment to permit
multiple targets:

a, b = b, a

The ',' is a problem in C since it has so many uses so some other syntax
would be needed. Let's just go with [a, b] = [b, a] for the moment. A
knock-on effect would be that this syntax could be defined to work with
arrays thereby not only a providing an array swap but also a normal
array assignment without having to mess with C's existing = operator.

<snip>
--
Ben.
Jakob Bohm
2016-12-21 05:11:14 UTC
Permalink
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
I agree that adding void memswap(void*, void*, size_t) to <string.h>
would satisfy most of the need for a swap operator, in particular the
need to swap objects in chunks that fit the register capacity and other
performance aspects of the target machine, as well as the possibility
of some compilers inlining the operation. Aliasing rules should be the
same as for memcpy plus special casing for the two pointer arguments
comparing equal (but no provisions for detecting overlap like memmove,
that would add unnecessary time and code size cost to the majority of
programs).

One major use of this would be in sorting and table routines similar to
qsort (which usually does this internally, though some implementation
might need a 3-way rotation instead to avoid allocating an
element-sized intermediary buffer).

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Thiago Adams
2016-12-21 10:46:32 UTC
Permalink
Post by Jakob Bohm
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
I agree that adding void memswap(void*, void*, size_t) to <string.h>
would satisfy most of the need for a swap operator, in particular the
need to swap objects in chunks that fit the register capacity and other
performance aspects of the target machine, as well as the possibility
of some compilers inlining the operation. Aliasing rules should be the
same as for memcpy plus special casing for the two pointer arguments
comparing equal (but no provisions for detecting overlap like memmove,
that would add unnecessary time and code size cost to the majority of
programs).
One major use of this would be in sorting and table routines similar to
qsort (which usually does this internally, though some implementation
might need a 3-way rotation instead to avoid allocating an
element-sized intermediary buffer).
Enjoy
Jakob
I would use memswap if it was efficient like manual code
{ T temp = a; a = b; b= temp; }

Do you have an implementation in mind?
Jakob Bohm
2016-12-21 12:40:28 UTC
Permalink
Post by Thiago Adams
Post by Jakob Bohm
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
I agree that adding void memswap(void*, void*, size_t) to <string.h>
would satisfy most of the need for a swap operator, in particular the
need to swap objects in chunks that fit the register capacity and other
performance aspects of the target machine, as well as the possibility
of some compilers inlining the operation. Aliasing rules should be the
same as for memcpy plus special casing for the two pointer arguments
comparing equal (but no provisions for detecting overlap like memmove,
that would add unnecessary time and code size cost to the majority of
programs).
One major use of this would be in sorting and table routines similar to
qsort (which usually does this internally, though some implementation
might need a 3-way rotation instead to avoid allocating an
element-sized intermediary buffer).
Enjoy
Jakob
I would use memswap if it was efficient like manual code
{ T temp = a; a = b; b= temp; }
Do you have an implementation in mind?
For an out of line implementation, loop over blocks of half
registerbank size, using registers as temp, then do the last partial
block. Exact block size depends on the instruction set architecture,
such as how many registers would be needed to implement a = b for each
block.

For an inline implementation, call the external implementation for
large or variable size, but do a simple swap via register inline for
small constant size, preferably before optimization passes that could
merge the steps with nearby load/store of the swapped objects.

Depending on CPU architecture other optimizations may be relevant, such
as cache preloading, the add/subtract trick for some CPUs where that
would be faster etc. Basically all the kinds of things that quality
hosted C implementations do for functions such as memcpy().

For example on the x86 architecture, this would depend on the optimal
choice between doing "a = b" with movs, with load/store through MMX/SSE
registers or with load/store through core registers. As another
example, on the ARM architecture, the optimal code would differ between
Arm, Thumb1, Thumb2 and Arm64 instruction variants, and may also depend
on the CPU implementation (e.g. Cortex A8 versus Cortex A13 versus
Cortex M1).


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Thiago Adams
2016-12-21 16:37:22 UTC
Permalink
Post by Jakob Bohm
Post by Thiago Adams
Post by Jakob Bohm
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
I agree that adding void memswap(void*, void*, size_t) to <string.h>
would satisfy most of the need for a swap operator, in particular the
need to swap objects in chunks that fit the register capacity and other
performance aspects of the target machine, as well as the possibility
of some compilers inlining the operation. Aliasing rules should be the
same as for memcpy plus special casing for the two pointer arguments
comparing equal (but no provisions for detecting overlap like memmove,
that would add unnecessary time and code size cost to the majority of
programs).
One major use of this would be in sorting and table routines similar to
qsort (which usually does this internally, though some implementation
might need a 3-way rotation instead to avoid allocating an
element-sized intermediary buffer).
Enjoy
Jakob
I would use memswap if it was efficient like manual code
{ T temp = a; a = b; b= temp; }
Do you have an implementation in mind?
For an out of line implementation, loop over blocks of half
registerbank size, using registers as temp, then do the last partial
block. Exact block size depends on the instruction set architecture,
such as how many registers would be needed to implement a = b for each
block.
It doesn't seams as fast as
{ T temp = a; a = b; b= temp; }
for pointers swap, because it has two steps.

(Of course we need to see the code and the generated code)

I just found this:

https://gustedt.wordpress.com/2010/10/23/a-generic-swap-implementation/
Jakob Bohm
2016-12-21 21:02:19 UTC
Permalink
Post by Thiago Adams
Post by Jakob Bohm
Post by Thiago Adams
Post by Jakob Bohm
Post by Keith Thompson
Post by Thiago Adams
I think if _swap where an operator in C we would have many advantages.
[...]
Post by Thiago Adams
I don't know if intrinsic functions could do the job because of the
signature of types?
Intrinsic functions can do anything you like. C doesn't currently
have intrinsic functions, so you're free to define their semantics.
I doubt that the committee would be willing to add such a major
new feature just to permit swapping.
A memswap() function would be one reasonable approach.
I agree that adding void memswap(void*, void*, size_t) to <string.h>
would satisfy most of the need for a swap operator, in particular the
need to swap objects in chunks that fit the register capacity and other
performance aspects of the target machine, as well as the possibility
of some compilers inlining the operation. Aliasing rules should be the
same as for memcpy plus special casing for the two pointer arguments
comparing equal (but no provisions for detecting overlap like memmove,
that would add unnecessary time and code size cost to the majority of
programs).
One major use of this would be in sorting and table routines similar to
qsort (which usually does this internally, though some implementation
might need a 3-way rotation instead to avoid allocating an
element-sized intermediary buffer).
Enjoy
Jakob
I would use memswap if it was efficient like manual code
{ T temp = a; a = b; b= temp; }
Do you have an implementation in mind?
For an out of line implementation, loop over blocks of half
registerbank size, using registers as temp, then do the last partial
block. Exact block size depends on the instruction set architecture,
such as how many registers would be needed to implement a = b for each
block.
It doesn't seams as fast as
{ T temp = a; a = b; b= temp; }
for pointers swap, because it has two steps.
I was trying to describe exactly that, for small objects, but a loop
using a fixed size (all in registers) buffer for larger objects.
Post by Thiago Adams
(Of course we need to see the code and the generated code)
The out-of-line implementation would be a piece of optimized (possibly
assembler) code in the C runtime library. Just as is currently the
case for the library implementations of memcpy, memmove, memcmp etc.

Something like (using a hypothetical 64 bit RISC CPU):

// Initial code to select algorithm variant and load args into
// registers, appropriately interleaved and optimized.
// This also special cases calling with the same pointer twice
// (NOP)
...
// Possibly handle the first few bytes and otherwise special case
// unaligned input pointers (because memswap(a + 3, b + 1, 11)
// is perfectly valid).
...

InnerLoopHuge:
LD r4, [r0]
LD r5, [r0 + 8]
LD r6, [r0 + 16]
LD r7, [r0 + 24]
LD r8, [r1]
LD r9, [r1 + 8]
LD r10, [r1 + 16]
LD r11, [r1 + 24]
ST [r1], r4
ST [r1 + 8], r5
ST [r1 + 16], r6
ST [r1 + 24], r8
ST [r0], r8
ST [r0 + 8], r9
ST [r0 + 16], r10
ST [r0 + 24], r11
ADD r0, #32
ADD r1, #32
ADD r2, #-1
CACHEHINT [r0 + 1024]
CACHEHINT [r1 + 1024]
JC InnerLoopHuge
...
InnerLoopMedium:
// Same as InnerLoopHuge, but without the cache preloading of next
// kibibyte
...

Tail24Bytes:
LD r4, [r0]
LD r5, [r0 + 8]
LD r6, [r0 + 16]
LD r8, [r1]
LD r9, [r1 + 8]
LD r10, [r1 + 16]
ST [r1], r4
ST [r1 + 8], r5
ST [r1 + 16], r6
ST [r0], r8
ST [r0 + 8], r9
ST [r0 + 16], r10
JMP TailSubWord

// Etc. for Other tail byte counts / 8 (4 variants)

Tail7Bytes:
LD32BIT r4,[r0]
LD16BIT r5,[r0 + 4]
LDBYTE r6,[r0 + 6]
LD32BIT r8,[r1]
LD16BIT r9,[r1 + 4]
LDBYTE r10,[r1 + 6]
ST32BIT [r1],r4
ST16BIT [r1 + 4],r5
STBYTE [r1 + 6],r6
ST32BIT [r0],r8
ST16BIT [r0 + 4],r9
STBYTE [r0 + 6],r10
// Clean up and return
...

// Etc. for Other tail byte counts % 8 (8 variants)
Post by Thiago Adams
https://gustedt.wordpress.com/2010/10/23/a-generic-swap-implementation/
Nice, but that's a lot of filler syntax oddities, at least some of
which could be replaced by _Static_assert(), and its still not good for
arbitrarily large blocks that won't fit in a register.

A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).

For example one vendor's <string.h> might contain

#pragma VENDORNAME_standardimpl memswap "memswap"
void memswap(void *p1, void *p2, size_t siz);

Where that pragma tells this vendors own compiler that in this
compilation unit, the unquoted memswap identifier passed to the pragma
refers to the vendors own memswap(), or equivalent implementation and
calls to it can be optimized accordingly. Also, even though p1 and p2
are not declared restrict (because they are allowed to be identical),
the compiler is still allowed to generate a diagnostic if the buffers
obviously overlap.





Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Thiago Adams
2016-12-22 11:35:58 UTC
Permalink
On Wednesday, December 21, 2016 at 7:02:21 PM UTC-2, Jakob Bohm

[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
For example one vendor's <string.h> might contain
#pragma VENDORNAME_standardimpl memswap "memswap"
void memswap(void *p1, void *p2, size_t siz);
Where that pragma tells this vendors own compiler that in this
compilation unit, the unquoted memswap identifier passed to the pragma
refers to the vendors own memswap(), or equivalent implementation and
calls to it can be optimized accordingly. Also, even though p1 and p2
are not declared restrict (because they are allowed to be identical),
the compiler is still allowed to generate a diagnostic if the buffers
obviously overlap.
Note, that we are talking about some compiler help again.
Jakob Bohm
2016-12-22 14:51:37 UTC
Permalink
Post by Thiago Adams
On Wednesday, December 21, 2016 at 7:02:21 PM UTC-2, Jakob Bohm
[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
For example one vendor's <string.h> might contain
#pragma VENDORNAME_standardimpl memswap "memswap"
void memswap(void *p1, void *p2, size_t siz);
Where that pragma tells this vendors own compiler that in this
compilation unit, the unquoted memswap identifier passed to the pragma
refers to the vendors own memswap(), or equivalent implementation and
calls to it can be optimized accordingly. Also, even though p1 and p2
are not declared restrict (because they are allowed to be identical),
the compiler is still allowed to generate a diagnostic if the buffers
obviously overlap.
Note, that we are talking about some compiler help again.
But not of the mythic/problematic "try to guess intention from
algorithm and apply optimizations that might interfere with the
principle of least surprise, causing bugs in programs that expect
normal language behavior" kind.

Just the regular, run-of-the-mill inlining of intrinsic/well-known C
library functions in cases that can be simplified without resorting to
such things things as "strict aliasing" assumptions.

A first order implementation would just have a regular, lightly
optimized portable C implementation in the C runtime and be done with
it.

A second order implementation could inline calls with a small fixed
count (as they already do for memcpy). The library implementation
could also be optimized via independent work in the library writing
team.

A third order implementation could add special rules beyond the basic
inlining, such as the warnings for overlapping buffers. It's all in
the realm of basic gradual improvement with no surprises for
implementors or users.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Keith Thompson
2016-12-22 17:17:29 UTC
Permalink
Jakob Bohm <jb-***@wisemo.com> writes:
[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
If memswap() were a standard library function, no pragma or
syntax extension would be necessary (though it might be useful).
A compiler would be permitted to recognize that memswap() is a
standard function, and generate any code it likes that implements
the required semantics for that particular call -- just as it can
do now for memcpy(), memset(), etc.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Richard Bos
2016-12-27 11:04:10 UTC
Permalink
Post by Keith Thompson
[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
If memswap() were a standard library function, no pragma or
syntax extension would be necessary (though it might be useful).
A compiler would be permitted to recognize that memswap() is a
standard function, and generate any code it likes that implements
the required semantics for that particular call -- just as it can
do now for memcpy(), memset(), etc.
And, nota bene, already often does. This is not a theoretical option
Keith has just dreamt up for the sake of an argument, this is already a
real advantage.

Richard
Jakob Bohm
2016-12-27 16:51:35 UTC
Permalink
Post by Richard Bos
Post by Keith Thompson
[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
If memswap() were a standard library function, no pragma or
syntax extension would be necessary (though it might be useful).
A compiler would be permitted to recognize that memswap() is a
standard function, and generate any code it likes that implements
the required semantics for that particular call -- just as it can
do now for memcpy(), memset(), etc.
And, nota bene, already often does. This is not a theoretical option
Keith has just dreamt up for the sake of an argument, this is already a
real advantage.
However with memswap being a new addition, decades after the others
were added, there is a high likelihood that a non-zero number of
somewhat important existing programs have conflicting "local"
definitions of memswap(). Providing a mechanism such as the one
illustrated helps avoid that problem, by not munging calls to the
"local" memswap() and by allowing a privately modified copy of string.h
to provide standard memswap under a different non-conflicting name.

It is a way to both generalize the concept of compiler-known function
and at the same time simplify the compiler logic needed to safely
recognize variants of built-in functions, all behind the scenes for
ordinary programs (as the explicit pragma occurs only in vendor-
supplied header files by default).

I recall older versions of the Borland compiler having a somewhat
similar pragma to tell it which of the runtime functions should be
implemented as intrinsics, it came in quite handy a few times.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
Keith Thompson
2016-12-27 16:58:41 UTC
Permalink
Post by Jakob Bohm
Post by Richard Bos
Post by Keith Thompson
[...]
Post by Jakob Bohm
A quality implementation would recognize calls to the standard
memswap() via some pragma or syntax extension in the supplied
<string.h>, then generate similar quick register code for small sizes
(such as primitive types).
If memswap() were a standard library function, no pragma or
syntax extension would be necessary (though it might be useful).
A compiler would be permitted to recognize that memswap() is a
standard function, and generate any code it likes that implements
the required semantics for that particular call -- just as it can
do now for memcpy(), memset(), etc.
And, nota bene, already often does. This is not a theoretical option
Keith has just dreamt up for the sake of an argument, this is already a
real advantage.
However with memswap being a new addition, decades after the others
were added, there is a high likelihood that a non-zero number of
somewhat important existing programs have conflicting "local"
definitions of memswap(). Providing a mechanism such as the one
illustrated helps avoid that problem, by not munging calls to the
"local" memswap() and by allowing a privately modified copy of string.h
to provide standard memswap under a different non-conflicting name.
All identifiers starting with "mem" are already reserved. N1570
7.31.13:

Function names that begin with str, mem, or wcs and a lowercase
letter may be added to the declarations in the <string.h> header.

Nevertheless, there are some existing memswap() functions -- but as far
as I can tell there aren't very many, which might suggest that there's
not much demand for the functionality.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Loading...