SQL

What are the advantages os SQL over third generation Language?

Mikito Harakiri

2006-01-10 17:42:00 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

Here are couple of paragraphs from preface of my advanced SQL
programming book (due in may):

<quote>
SQL is a very successful language. This might be surprising to a
newcomer who generally find SQL a little bit old fashioned compared to
"modern" programming languages. It's almost as old as COBOL and it
looks like FORTRAN, why isn't it obsolete yet? Let assure the reader
that this appearance is misleading. Under the cover of sloppy and
archaic syntax we find a high abstraction language.

SQL programming is very unusual from procedural perspective: there is
no explicit flow control, no loops, and no variables either to apply
operations to, or to store intermediate results into. SQL heavily
leverages predicates instead, which elevates it to Logic Programming.
Then, grouping and aggregation syntax blend naturally into this already
formidable logic foundation. Anybody who rediscovers that matrix or
polynomial multiplication can be written in just three lines of code
has a long lasting impression.
</quote>

Oliver Wong

2006-01-10 17:57:59 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

Here are couple of paragraphs from preface of my advanced SQL
<quote>
SQL is a very successful language. This might be surprising to a
newcomer who generally find SQL a little bit old fashioned compared to
"modern" programming languages. It's almost as old as COBOL and it
looks like FORTRAN, why isn't it obsolete yet? Let assure the reader
that this appearance is misleading. Under the cover of sloppy and
archaic syntax we find a high abstraction language.
SQL programming is very unusual from procedural perspective: there is
no explicit flow control, no loops, and no variables either to apply
operations to, or to store intermediate results into. SQL heavily
leverages predicates instead, which elevates it to Logic Programming.
Then, grouping and aggregation syntax blend naturally into this already
formidable logic foundation. Anybody who rediscovers that matrix or
polynomial multiplication can be written in just three lines of code
has a long lasting impression.
</quote>

I thought SQL was a declarative language, in the sense that you say what
you want, not how to get what you want, and the engine takes care of
figuring out the optimal implementation for filling your request.

Is SQL a Turing Complete language?

- Oliver

Mikito Harakiri

2006-01-10 18:30:34 UTC

Post by Oliver Wong
I thought SQL was a declarative language, in the sense that you say what
you want, not how to get what you want, and the engine takes care of
figuring out the optimal implementation for filling your request.

This is correct. It doesn't contradict to what I wrote, however.

Post by Oliver Wong
Is SQL a Turing Complete language?

Depends what features are allowed. Basic relational algebra enhanced
with subqueries agregation and grouping (which is considered core SQL)
is not Turing complete. With recursion it is. ANSI standard SQL is
overwhelmed with features, there are even spreadsheet kind of
extensions which introduce intermediate variables. Many people consider
those procedural looking extensions as ugly.

topmind

2006-01-11 03:08:10 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

SQL is about 15 years younger than COBOL, so "almost as old" is
misleading. FORTRAN? I think SQL looks more like COBOL than FORTRAN.

Take that book back now; don't wait for May :-)

Post by Mikito Harakiri
why isn't it obsolete yet? Let assure the reader
that this appearance is misleading. Under the cover of sloppy and
archaic syntax we find a high abstraction language.

What I don't get is why other relational languages don't make a run at
SQL? The only serious attempt is from the Date group (Tutorial D). (I
have also drafted a relational language myself, tentatively called
SMEQL.)

People make a jillion procedural and OOP languages, but there are only
2 or so SQL competitors, and they are in draft mode for the most part.
Why the big diff?

Post by Mikito Harakiri
SQL programming is very unusual from procedural perspective: there is
no explicit flow control, no loops, and no variables either to apply
operations to, or to store intermediate results into. SQL heavily
leverages predicates instead, which elevates it to Logic Programming.
Then, grouping and aggregation syntax blend naturally into this already
formidable logic foundation. Anybody who rediscovers that matrix or
polynomial multiplication can be written in just three lines of code
has a long lasting impression.
</quote>

-T-

Mikito Harakiri

2006-01-11 18:32:44 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

SQL is about 15 years younger than COBOL, so "almost as old" is
misleading.

Wikipedia: "COBOL was initially created in 1959"
Codd's seminal paper "A Relational Model of Data for Large Shared Data
Banks" is dated 1970.

There was a discussion at /. Synopsis:
"What about newer relational languages? Why, for example, the latest
and greatest Tutorial D didn't take over the world? It has improved
notation, better NULL handling, and pure set semantics. This is not
enough however; a New and Improved query language had better be leaps
and bounds ahead, not just simpler to type."

Mikito Harakiri

2006-01-11 21:39:54 UTC

Mikito Harakiri wrote:
It's almost as old as COBOL and it

Post by Mikito Harakiri
looks like FORTRAN,

SQL is about 15 years younger than COBOL, so "almost as old" is
misleading.

Wikipedia: "COBOL was initially created in 1959"
Codd's seminal paper "A Relational Model of Data for Large Shared Data
Banks" is dated 1970.

Well, the time precedence typo in your message aside, the 15 years gap
is just about right. Perhaps, the better phrasing is:

"It's almost as old as C (which spawned at least 3 newer generation
languages) and it looks like COBOL, why isn't it obsolete yet?"

Thank you for correction.

topmind

2006-01-13 02:20:17 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

SQL is about 15 years younger than COBOL, so "almost as old" is
misleading.

Wikipedia: "COBOL was initially created in 1959"
Codd's seminal paper "A Relational Model of Data for Large Shared Data
Banks" is dated 1970.

Codd did *not* create SQL, contrary to popular belief. Early test
relational languages were more math-like. IBM later decided to COBOLify
them, allegedly thinking that made it more palattable to managers
(Dilbertian PHB view).

"What about newer relational languages? Why, for example, the latest
and greatest Tutorial D didn't take over the world? It has improved
notation, better NULL handling, and pure set semantics. This is not
enough however; a New and Improved query language had better be leaps
and bounds ahead, not just simpler to type."

I generally agree, but for those who do queries all day a somewhat
better language would be worth it. I think there is room for a good
"mathy" standard for power query worriors, and the more verbose
English-like SQL for occasional users.

Actually if SQL standard(s) added named virtual views that would be a
huuuuge leap forword for more complicated queries. That way you can
name sub-queries instead of having to nest them. Big SQL is too damned
nesty, leading to run-on sentences.

But there are things that SQL is still crappy at even with that fix.
For example, often you want to say "get all columns except these few".
In SQL you have to name them all, minus the excluded ones. SMEQL allows
one to use the query language's built-in set tools to "calculate"
column lists so that you could specify all colums minus another list.
For stuff like this, the "ME" stands for "meta enabled".

-T-

JXStern

2006-01-15 17:58:17 UTC

Post by topmind
What I don't get is why other relational languages don't make a run at
SQL? The only serious attempt is from the Date group (Tutorial D). (I
have also drafted a relational language myself, tentatively called
SMEQL.)
People make a jillion procedural and OOP languages, but there are only
2 or so SQL competitors, and they are in draft mode for the most part.
Why the big diff?

I can answer that. It's the cost of entry. Any schmuck can define a
new scripting language and throw something out there in a matter of
weeks, and it will compete with other scripting languages on a more or
less equal basis. For any database language, you need an engine,
which is not rocket science but is a solid piece of work to code, and
besides a simple parser you NEED an optimizer, which major 3GL
compilers have, but scripting languages don't. I've recently been
doing a lot of database optimization work, and what that turns out to
be (on SQLServer) is understanding how its optimizer works, and
finding when it breaks down (which can be very often under certain
scenarios) - and then getting it back into proper operation. MySQL
has never been a real competitor because it lacked the optimization,
much less the language features, of larger engines.

That said, I was also just ruminating on how ambiguous SQL is from a
semantic perspective, for starters in that joins may either expand or
constrain a selection. If the intent of query statements was more
evident, it would be easier to optimize properly. I was just Googling
up what I could of QUEL, and wondering what I would do for a Josh's
Own Version of a Structured Query Language. So, yes, I second your
question, why more of this hasn't been done, by Oracle and Microsoft
their own selves, if nobody else (well, Microsoft isn't exactly known
for such innovations, and their engine is not open to modification,
and Oracle is fat and happy with PL/SQL and Java applets). Maybe
somebody can branch off MySQL in such a direction. Did Stonebreaker
ever spend much time on the language side? Not that I know of.

So, back to OP, I'd say the difference between SQL and 3GLs is in the
implied execution engines, which implies a different semantic approach
to the entire topic of computation.

But if that was a homework question, all they're looking for is the
word "nonprocedural", not that I'd call that an "advantage" as such,
see parallel post.

J.

JXStern

2006-01-15 17:45:22 UTC

On 10 Jan 2006 09:42:00 -0800, "Mikito Harakiri"

Post by Mikito Harakiri
Here are couple of paragraphs from preface of my advanced SQL
<quote>
SQL is a very successful language.

Mostly as an insert into extended and as of yet non-standard versions
from Oracle, Microsoft, and others.

Post by Mikito Harakiri
This might be surprising to a
newcomer who generally find SQL a little bit old fashioned compared to
"modern" programming languages.

Old-fashioned? Pure SQL is purely non-procedural, of which there is
only a whiff in most modern languages, along the lines of mapping
predicates, and that's hardly the same thing.

But generally, richer predicates are what people add to simple 3GLs in
order to make things SOTA. And SQL has some data-centric and semantic
aspects which make it more of an early 5GL than anything yet out there
and popular.

Post by Mikito Harakiri
It's almost as old as COBOL and it
looks like FORTRAN, why isn't it obsolete yet?

Same quibble as topmind, Cobol was used in 1959, SQL wasn't a
practical language until, oh, the late 1980s.

Post by Mikito Harakiri
Let assure the reader
that this appearance is misleading. Under the cover of sloppy and
archaic syntax we find a high abstraction language.

I'm not sure you want to call a language "sloppy", the use might be
sloppy but not the definition. Similarly, I'm not sure what "archaic
syntax" could possibly mean, foresooth.

Without some variable extensions, it wouldn't be nearly as popular as
it is today.

And, you know, RPG was non-procedural before Fortran, in fact,
plugboard programming was non-procedural before any high-level
language! Cobol's report writer was non-procedural, I believe that
goes back to the original definition - it was just refined RPG, after
all!

J.

H. S. Lahman

2006-01-11 16:21:20 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

Sounds like another homework problem.

SQL /is/ a 3GL. It is just highly specialized to support Data Modeling.
If you want to model data and access it in a very generic fashion,
then SQL is pretty much the only game in town.

OTOH, if you are solving problems where persistence access is of
secondary concern (i.e., applications beyond RAD CRUD/USER pipelines),
then SQL is not a very good language to use because it doesn't handle
dynamic issues well, doesn't support abstraction, it is not very
maintainable, and it is just plain ugly. So in that case one should
limit SQL usage to the implementation of a subsystem whose mission is to
actually talk to the persistence mechanisms.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

topmind

2006-01-14 02:52:54 UTC

Post by priya1
What are the advantages os SQL over third generation Language?

Sounds like another homework problem.
SQL /is/ a 3GL.

Depends on how you define GL's. What do you consider an example of a
4GL?

Post by H. S. Lahman
It is just highly specialized to support Data Modeling.
If you want to model data and access it in a very generic fashion,
then SQL is pretty much the only game in town.
OTOH, if you are solving problems where persistence access is of
secondary concern (i.e., applications beyond RAD CRUD/USER pipelines),
then SQL is not a very good language to use because it doesn't handle
dynamic issues well, doesn't support abstraction, it is not very
maintainable, and it is just plain ugly.

Are you talking about SQL specificly, or relational in general?
"Doesn't support abstraction" is a rather sweeping claim.

I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

Post by H. S. Lahman
So in that case one should
limit SQL usage to the implementation of a subsystem whose mission is to
actually talk to the persistence mechanisms.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman

-T-

H. S. Lahman

2006-01-14 16:31:41 UTC

Responding to Jacobs...

Post by H. S. Lahman
SQL /is/ a 3GL.

Depends on how you define GL's. What do you consider an example of a
4GL?

There are a number of definitions. The one I find most useful is that a
4GL language solution is independent of particular computing environment
implementations. IOW, the 4GL expresses solutions purely in problem
space terms. That let's SQL out because it depends upon the specific
RDB implementation of the RDM.

UML with a compliant AAL is an example of a 4GL. If I build an OOA
model for, say, a POS Order Entry System, that model can be
unambiguously implemented without change either manually as a print mail
order catalogue or as software for a browser-based 'net application.
The fundamental processing logic of catalogue organization and order
entry is expressed the same way regardless of the implementation context.

Are you talking about SQL specificly, or relational in general?
"Doesn't support abstraction" is a rather sweeping claim.

SQL specifically. IMO it isn't a very good language even for
relational. It survives because of the huge volumes of legacy code
around. Note that when you do P/R you don't program solely in SQL; you
just use SQL to talk to the RDB.

[One could argue that "bad" languages are/were very successful /because/
they were inelegant. For example, the inelegant verbosity and tedious
simplicity of COBOL dominated software development for three decades
because it worked well under the constraints of practical development.]

SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

Only indirectly. CRUD/USER pipeline applications have architectures and
infrastructures that are hard-wired around the RDB implementation of the
RDM. That is usually not the case for OO applications solving more
complex problems. Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

[That difference in the way the RDM is applied is one reason why OOA/D
objects in a non-CRUD/USER application model will often not map 1:1 with
the RDB Data Model's tables. So in such applications one tends to
isolate RDB access in a single subsystem so that the conversion rules
can be encapsulated in one place. (The fact that both model the same
problem space and both employ the RDM in some fashion ensures that there
will be unambiguous conversion rules.)]

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

topmind

2006-01-14 19:56:31 UTC

Post by H. S. Lahman
Responding to Jacobs...

Post by H. S. Lahman
SQL /is/ a 3GL.

Depends on how you define GL's. What do you consider an example of a
4GL?

What you are talking about is not a limit, but a "problem" of choice.
If the *only* RDBMS on the planet was say PostgreSql, then such a
complaint would not exist anymore than there being only one
implementation of say a Python interpreter. Nobody ever claimed that
Python was a lessor generation language because there was only one
implementation. One can run Postgre on Unix, Macs (IIRC), PC's, etc. I
would note.

Why demote the rank of something simply because there are choices?

Post by H. S. Lahman
UML with a compliant AAL is an example of a 4GL. If I build an OOA
model for, say, a POS Order Entry System, that model can be
unambiguously implemented without change either manually as a print mail
order catalogue or as software for a browser-based 'net application.
The fundamental processing logic of catalogue organization and order
entry is expressed the same way regardless of the implementation context.

And if other people/vendors made their own flavor of this tool with
differences between the implimentation, then it would be in the same
boat. Why should implementation A1 and A2 demote the "generation"
ranking of A?

Are you talking about SQL specificly, or relational in general?
"Doesn't support abstraction" is a rather sweeping claim.

SQL specifically. IMO it isn't a very good language even for
relational.

I agree that it needs improvement/overhaul. However, it is still a
powerful tool even without fixes.

Post by H. S. Lahman
It survives because of the huge volumes of legacy code
around. Note that when you do P/R you don't program solely in SQL; you
just use SQL to talk to the RDB.

That is Divide and Conquer, letting code do what it does best and the
DB doing what it does best.

Post by H. S. Lahman
[One could argue that "bad" languages are/were very successful /because/
they were inelegant. For example, the inelegant verbosity and tedious
simplicity of COBOL dominated software development for three decades
because it worked well under the constraints of practical development.]

IMO COBOL survived largely because it was tuned to the domain, not
because of its English-like syntax.

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

This depends on how one defines "problem space abstraction". The OO
view of PSA is kinda lame if you ask me. It does not factor out common
"database verbs" into a single tool or convention, but reinvents it
over and over for many classes. Repetative SET/GET syndrome is an
example of this poor pattern factoring. OO'ers often don't see this
ugly ugly duplication of concept.

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

We have been over this already. There is no objective measurement that
says biz apps are "simpler". I will agree that it is usually easier to
learn business domains than say 3D graphics or rocket science, but
learning the domain and implementing apps for it are two different
things.

You are participating in Domain Bigotry here.

Post by H. S. Lahman
Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

UML takes us back to the navigational/CODYSAL pointer/path hell that
proved a mess in the 60's and 70's. UML and OO is the structural GO TO
of the modern age.

Post by H. S. Lahman
[That difference in the way the RDM is applied is one reason why OOA/D
objects in a non-CRUD/USER application model will often not map 1:1 with
the RDB Data Model's tables. So in such applications one tends to
isolate RDB access in a single subsystem so that the conversion rules
can be encapsulated in one place. (The fact that both model the same
problem space and both employ the RDM in some fashion ensures that there
will be unambiguous conversion rules.)]

I will agree that "local views" are often needed. However, I don't see
how OO helps with this. (Except perhaps that current languages usually
don't support local table manipulation; but that is a fault of
implimentors, not P/R. It used to be common until the OO hype made
vendors yank it out, becoming a self-fullfilling prophecy.)

Post by H. S. Lahman
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman

-T-

Patrick May

2006-01-15 13:50:15 UTC

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the
RDB implementation. However, here I was referring to problem
space abstraction.

This depends on how one defines "problem space abstraction".

"Problem space abstraction" typically refers to a concept from
the domain for which the software system is being developed,
e.g. Customer, Sales Order, Network Element, Product, etc. This is
distinguished from "solution space abstractions" such as tables, rows,
columns, keys, pointers, functions, and so on. This isn't a point of
contention among experienced software developers.

Post by topmind
The OO view of PSA is kinda lame if you ask me. It does not factor
out common "database verbs" into a single tool or convention, but
reinvents it over and over for many classes.

I can't make sense of this assertion as stated. Could you
provide examples?

Post by topmind
Repetative SET/GET syndrome is an example of this poor pattern
factoring.

Proliferation of get/set methods is a code smell. Immutable
objects are to be preferred.

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

We have been over this already. There is no objective measurement
that says biz apps are "simpler".

Mr. Lahman does not appear to be discussing particular domains.
CRUD/USER applications appear in most domains, after all even
laboratories developing advanced nanotechnology technology need to be
able to enter data and generate reports. Correspondingly, not all
business applications are limited to CRUD/USER behavior. As an
example, I'm currently working on a set of systems for a mobile
telephony operator. The systems are definitely providing business
functionality (provisioning, billing, fraud detection, etc.) but the
components are distributed over a large number of physical machines,
interacting with numerous vendors and partners, dealing with multiple
legacy systems, and supporting complex business rules. Performance,
scalability, resiliency, security, recoverability, and other
non-functional requirements are at least as difficult to address as
the core business requirements.

[ . . . ]

Post by topmind
You are participating in Domain Bigotry here.

No, he is simply stating the obvious: CRUD/USER applications are
not particularly complex, especially when compared with other software
systems.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

topmind

2006-01-15 20:30:24 UTC

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the
RDB implementation. However, here I was referring to problem
space abstraction.

This depends on how one defines "problem space abstraction".

But how are tables less close to the domain than classes, methods, and
attributes?

(I agree that some of the behavior side is not something to be done on
the DB, but that is simply a partitioning of specialties issue.)

Post by topmind
The OO view of PSA is kinda lame if you ask me. It does not factor
out common "database verbs" into a single tool or convention, but
reinvents it over and over for many classes.

I can't make sense of this assertion as stated. Could you
provide examples?

Perhaps I will return to it later because it is somewhat off-topic.

Post by topmind
Repetative SET/GET syndrome is an example of this poor pattern
factoring.

Proliferation of get/set methods is a code smell. Immutable
objects are to be preferred.

This is not the censuses in the OO community.

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

We have been over this already. There is no objective measurement
that says biz apps are "simpler".

Agreed. I never said that biz apps were only CRUD screens/reports.

I would note that a lot of the issues you mentioned, such as
performance,
scalability, resiliency, and recoverability can be obtained by
purchasing a "big-iron" RDBMS such as Oracle or DB2. The configuration
and management of those issues is then almost a commodity skill and not
as tied to the domain as a roll-your-own solution would be (which
OO'ers tend to do).

"Security" is mostly just massive ACL tables.

Post by Patrick May
[ . . . ]

Post by topmind
You are participating in Domain Bigotry here.

No, he is simply stating the obvious: CRUD/USER applications are
not particularly complex, especially when compared with other software
systems.

Again, I have yet to see an objective or even semi-objective way to
measure "complexity". Again, the basic concepts of CRUD are easier to
learn than most other domains or problems, but that does not mean that
the implementation and maintainence of related apps is simple.

For lack of a better metric, I propose lines of code (LOC) for now as
our metric for complexity. "Everybody just knows it" is not good
enough, and questionable. CRUD apps are not less lines of code
(although the skill of the developer can make a huge difference).

Can you make an argument for something being lots of LOC without being
"complex" (beyond learning the domain)? If not, then you are probably
stuck using LOC here.

(And I don't mean bad use of code, such as hard-wiring the database
values into long case statements.)

Post by Patrick May
Sincerely,
Patrick

-T-

Patrick May

2006-01-16 20:27:03 UTC

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts
the RDB implementation. However, here I was referring to
problem space abstraction.

This depends on how one defines "problem space abstraction".

"Problem space abstraction" typically refers to a concept
from the domain for which the software system is being developed,
e.g. Customer, Sales Order, Network Element, Product, etc. This
is distinguished from "solution space abstractions" such as
tables, rows, columns, keys, pointers, functions, and so on. This
isn't a point of contention among experienced software developers.

But how are tables less close to the domain than classes, methods,
and attributes?

The ability to model behavior as well as data makes general
purpose languages better able to model the problem domain than is
SQL.

Post by topmind
(I agree that some of the behavior side is not something to be done
on the DB, but that is simply a partitioning of specialties issue.)

No, it's a qualitative difference.

Post by topmind
Repetative SET/GET syndrome is an example of this poor pattern
factoring.

Proliferation of get/set methods is a code smell. Immutable
objects are to be preferred.

This is not the censuses in the OO community.

Yes, it is. Josh Bloch recommends immutability explicity in
"Effective Java" and gives solid reasons for his position.
Proliferation of getters and setters violates encapsulation, one of
the defining characteristics of object technology. Some research will
show you that OO designs focus on behavior, not state. You should
also check out the Law of Demeter and similar guidelines that provide
further evidence that excessive use of accessors and mutators is not
good OO form.

Post by topmind
I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can be
obtained by purchasing a "big-iron" RDBMS such as Oracle or DB2. The
configuration and management of those issues is then almost a
commodity skill and not as tied to the domain as a roll-your-own
solution would be (which OO'ers tend to do).

It is statements like this that strongly suggest that you have
never developed a large, complex system. The vast majority of
businesses that need systems of this complexity have legacy software
consisting of a number of COTS applications and custom components,
none of which were designed to work with each other. These have been
selected or developed for good business reasons and cannot be
aggregated and run on a single piece of kit, no matter how large.

Even if it were possible to go down the mainframe route, in many
cases it would not make business sense. Big iron is expensive to buy,
maintain, and upgrade. Distributed systems running on relatively
inexpensive hardware can provide a more cost-effective solution.

Post by topmind
"Security" is mostly just massive ACL tables.

That is profoundly . . . naive. I strongly urge you to read
everything you can find by Bruce Schneir, join the cryptography
mailing list run by Perry Metzger, and not say another word about
security until you understand why your statement is so deeply
embarrassing to you. For a quick, very small taste of why ACL tables
don't even begin to scratch the surface of the problem, read
http://www.isi.edu/gost/brian/security/kerberos.html.

Post by topmind
You are participating in Domain Bigotry here.

No, he is simply stating the obvious: CRUD/USER applications
are not particularly complex, especially when compared with other
software systems.

Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

I suggest you Google for "software complexity" and you'll find
several million links. Starting from a page like
http://yunus.hun.edu.tr/~sencer/complexity.html will give you pointers to
other research if you are genuinely interested in learning.

Post by topmind
Again, the basic concepts of CRUD are easier to learn than most
other domains or problems, but that does not mean that the
implementation and maintainence of related apps is simple.

CRUD applications are, however, not particularly complex as
software systems go. Your claims otherwise indicate a lack of
experience with anything else.

Post by topmind
For lack of a better metric, I propose lines of code (LOC) for now
as our metric for complexity.

This is yet another suggestion that shows you don't know much
about the topic you're discussing. Lines of code is not a good
measurement of anything. Do some research.

Post by topmind
Can you make an argument for something being lots of LOC without
being "complex" (beyond learning the domain)? If not, then you are
probably stuck using LOC here.

Some of the largest programs I've seen, in terms of lines of
code, are for generating reports. Getting just the right information
from a large data set and laying it out precisely as requested can be
time consuming, but it's not particularly challenging.

Object technology is not immune. J2EE in general, and EJBs in
particular, require a great deal of code to provide functionality that
could be provided far more efficiently.

On the other hand, there are some delightfully complex software
systems that consist of only a few hundred lines of code. Functional
languages seem especially good for this. See one of Peter Norvig's
books for a few examples.

Lines of code is a useless metric.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Mikito Harakiri

2006-01-16 21:43:14 UTC

Post by topmind
Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

Hmm. I tried to find "software complexity" in wikipedia and failed.
Apparently this topic (and the link you supplied) is a typical example
of junk science.

In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000
for example? Speaking of *finite* objects, it is a basic fact that on
TM can model another, therefore, you have to [arbitrarily] choose some
reference TM. It is interesting [and nontrivial] fact that there is a
way to establish complexity metrics for infinite objects, though.

Hasta

2006-01-16 22:13:45 UTC

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000

Well, there is an objective measure of the complexity of
100000110000000000. It's the length of the smallest
program able to generate that string.

Browse for chaitin-kolmogorov complexity/randomness.
A fascinating subject :-)

Mikito Harakiri

2006-01-16 22:48:48 UTC

Post by Hasta

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000

Well, there is an objective measure of the complexity of
100000110000000000. It's the length of the smallest
program able to generate that string.
Browse for chaitin-kolmogorov complexity/randomness.
A fascinating subject :-)

This is exactly what I had in mind (although I wanted to emphasize
Martin-Loef criteria of randomness). Therefore, what is the length of
that program in the earlier example?

Post by Hasta
From wikipedia: "More formally, the complexity of a string is the

length of the string's shortest description in some fixed description
language. The sensitivity of complexity relative to the choice of
description language is discussed below."

Excuse me but this is not a very practical suggestion. For finite
object there is no mathematically sound way to establish that

100000110000000000

is more complex than

100000000000000000

Again, this is what the earler message "no *objective* measure for
program complexity" was saying.

Hasta

2006-01-17 06:19:26 UTC

Post by Hasta
Well, there is an objective measure of the complexity of
100000110000000000. It's the length of the smallest
program able to generate that string.
Browse for chaitin-kolmogorov complexity/randomness.
A fascinating subject :-)

This is exactly what I had in mind (although I wanted to emphasize
Martin-Loef criteria of randomness). Therefore, what is the length of
that program in the earlier example?

Post by Hasta
From wikipedia: "More formally, the complexity of a string is the

length of the string's shortest description in some fixed description
language. The sensitivity of complexity relative to the choice of
description language is discussed below."
Excuse me but this is not a very practical suggestion. For finite
object there is no mathematically sound way to establish that
100000110000000000
is more complex than
100000000000000000
Again, this is what the earler message "no *objective* measure for
program complexity" was saying.

Well, pick up your prefered language and make it part of your definition
of complexity. You then have a very objective measure. Chaitin uses
a micro-lisp with seven statements.

With all reasonable general purpose languages (including english :-)
complexity (100000110000000000) is greater than complexity
(100000000000000000). In english, the complexity of the later is 6.
The complexity of the former is probably 11.

Of couse, the main problem of C/K complexity is that it is not
computable in general :-)

Have a nice day, Mikito.

--- Raoul

topmind

2006-01-17 02:31:29 UTC

Post by Hasta

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000

Well, there is an objective measure of the complexity of
100000110000000000. It's the length of the smallest
program able to generate that string.

That is more or less our old friend the "lines of code" metric.

However, code-size metrics can be sticky to measure. For example,
shouldn't long lines be counted as more than short lines? Should long
names be penalized? Does a parenthesis count as much as a variable
name? If not, how much? 1/3? If we don't count it, then somebody could
write a parenthesis-oriented language and win it all. (Kind of like
that BrainF*ck language.)

Plus, it may be nearly impossible to prove that a given candidate is
the shortest *possible*. We could only compare what is presented.

However, I don't think P. May is interested in *any* kind of code size
metric. However, I have no idea what he has in mind as an alternative.

Post by Hasta
Browse for chaitin-kolmogorov complexity/randomness.
A fascinating subject :-)

-T-

Dmitry A. Kazakov

2006-01-17 09:07:37 UTC

Post by Hasta

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000

Well, there is an objective measure of the complexity of
100000110000000000. It's the length of the smallest
program able to generate that string.

See Richard's paradox.

[ There cannot be objective measure, if no language fixed. ]

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Oliver Wong

2006-01-16 23:58:20 UTC

Post by topmind
Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

You might want to lookup Entropy (in the context of information, not
thermodynamics). If you like Wikipedia, see
http://en.wikipedia.org/wiki/Information_entropy

But I suspect this has very little to do with "Software Complexity" as
Patrick May means it.

Post by Mikito Harakiri
Speaking of *finite* objects, it is a basic fact that on
TM can model another, therefore, you have to [arbitrarily] choose some
reference TM. It is interesting [and nontrivial] fact that there is a
way to establish complexity metrics for infinite objects, though.

I'm not sure what the "basic" fact that one Turing Machine can model
another Turing Machine has to do with software complexity or information
entropy. Seems rather offtopic here.

What does it mean for an object to be "infinite" in this context, and
what does it mean for an infinite object to be "complex" in this context?

- Oliver

Mikito Harakiri

2006-01-17 00:43:10 UTC

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000
for example?

You might want to lookup Entropy (in the context of information, not
thermodynamics). If you like Wikipedia, see
http://en.wikipedia.org/wiki/Information_entropy
But I suspect this has very little to do with "Software Complexity" as
Patrick May means it.

You have to realize that the whole idea of random sequence definition
arisen from the mathematicians being unsatisfied with foundation of
probability theory. It was Kolmogorov who established probability
theory as merely an applied measure theory. Unfortunately, this
development had very little practical impact, as it failed to define
what is random sequence. Kolmogoroff continued his quest, but failed to
give a satisfactory definition of random sequence. It was Martin-Loef
who suceeded completing Kolmogoroff's program in 1966. Then Gregory
Chaitin developed theory further, emphasizing computational complexity
side.

Entropy, while being just a statistical measure, suffers from the same
foundation problems that plagued probability theory. You can't define
statistical measure on a sample consisting of just a single object.

Perhaps I have to explain why the concept of "random" sequence is
important in the context of this complexity discussion. It is
considered more challenging to generate "random" sequence as compared
to "nonrandom" one, hence intuitively random sequences are more complex
than non-random ones. I believe, Knuth has some initial exposition to
random sequence generation in volume 2.

I'm not sure what the "basic" fact that one Turing Machine can model
another Turing Machine has to do with software complexity or information
entropy. Seems rather offtopic here.

http://en.wikipedia.org/wiki/Kolmogorov_complexity

The first theorem in the basic results uses the concept of universal
TM.

Post by Oliver Wong
What does it mean for an object to be "infinite" in this context, and
what does it mean for an infinite object to be "complex" in this context?

Infinite sequence of 0s and 1s versus finite one. Again, the concept of
random infinite sequence is quite counterintuitive. You can predfix
random sequence with million of 1s, and it would still be a random
sequence. Once again, there is no way to define what random *finite*
sequence is. In layman terms, if you go to Vegas and roulete produces a
sequence of 10 zeros in a row, you can suspect that roulete is
defective, but there is no mathematical foundation that would support
your belief.

Dmitry A. Kazakov

2006-01-17 09:38:26 UTC

Post by Mikito Harakiri
Perhaps I have to explain why the concept of "random" sequence is
important in the context of this complexity discussion. It is
considered more challenging to generate "random" sequence as compared
to "nonrandom" one, hence intuitively random sequences are more complex
than non-random ones.

Huh, 1111111 is exactly as random as 10101101. As a matter of fact, there
is no way to generate random sequences using a FSM. There is no way even to
test if a sequence is a realization of a random process or not. You can
only test some hypothesis H and the answer will be: the probability of H is
in the interval [a,b].

Post by Oliver Wong
What does it mean for an object to be "infinite" in this context, and
what does it mean for an infinite object to be "complex" in this context?

Infinite sequence of 0s and 1s versus finite one. Again, the concept of
random infinite sequence is quite counterintuitive.

Well, randomness as a whole is counterintuitive and there is nothing to do
about it.

Post by Mikito Harakiri
You can predfix
random sequence with million of 1s, and it would still be a random
sequence. Once again, there is no way to define what random *finite*
sequence is.

Egh? What is meant here under non-infinity? That the random variable ceases
to exist after N trials, or that you stop trial after N attempts?

Post by Mikito Harakiri
In layman terms, if you go to Vegas and roulete produces a
sequence of 10 zeros in a row, you can suspect that roulete is
defective, but there is no mathematical foundation that would support
your belief.

Right, which does not ruin Kolmogorov concept of complexity, which fixes
the language. 10101101 looks complex, not because it inherently is, but
solely because of the language humans are using. Change the language and it
might become simple.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Oliver Wong

2006-01-18 23:01:15 UTC

Post by Mikito Harakiri
In general, I agree with Bruce -- there is no objective measure for
program complexity. What is the complexity measure of
100000110000000000
for example?

You might want to lookup Entropy (in the context of information, not
thermodynamics). If you like Wikipedia, see
http://en.wikipedia.org/wiki/Information_entropy
But I suspect this has very little to do with "Software Complexity" as
Patrick May means it.

Sorry, I didn't really follow this above paragraph. Not sure what you
were trying to say with this. Is it important to your point, or just
background info to prepare for the next paragraph? If it's the former, could
to explain it again?

Perhaps I completely misunderstood Patrick May, but it seems he is
talking about the complexity of developing real-world programs, and not of
arbitrary text strings, which is why I said "I suspect this has very little
to do with 'Software Complexity' as Patrick May means it."

Post by Mikito Harakiri
Entropy, while being just a statistical measure, suffers from the same
foundation problems that plagued probability theory. You can't define
statistical measure on a sample consisting of just a single object.

I didn't mean for "You might want to lookup Entropy" to be a direct
answer to your question of "there is no objective measure for program
complexity. What is the complexity measure of 100000110000000000 for
example?", but rather as a starting point for further research. The question
seems a bit misguided to me, so I had assumed you didn't know much about the
topic.

There isn't enough information in the question to actually give an
answer to it. It is not certain, for example, where "100000110000000000"
represents a binary number, a decimal number, a number is some other base,
an ASCII string, a Unicode string, or something else. Given the context, one
might assume that it was the binary representation of a computer program
(but for what platform?), but again nothing is explicitly said one way or
the other.

Let's assume that it's the binary representation for a program for some
particular platform, which implies that it describes an actual valid
program. If so, then we have defined the "language" we are interested in:
the language is the set of all finite strings which are binary
representations for a program in the given platform.

Presumably, there exists some binary strings which are NOT legal
representations of programs, and so binary may not be the optimal (in terms
of size of strings) representation for this program. Also, the platform
might have some peculiarities that lead to all legal representations having
the same prefix (as a header, for example).

We can perform a measure, then, of how much information is present in
this string for that language, and that measure can be called the complexity
of the string (and hence the program). Traditionally, though, this measure
is called the information entropy.

I'm not sure how you are defining "challenging" here. If you generate a
random sequence, you may get lucky, and end up with a very "simple"
sequence, where I'm using "simple" in the sense of "low entropy". You could
do some probabilistic analysis and determine the "average complexity" of a
randomly generated sequence, given your generating algorithm.

But whatever that value turns out to be, I could then hand-craft a
program which is more complex (i.e. whose entropy is higher). So what does
it mean for a random sequence to be "more complex" than a non-random one?
What does it mean for a sequence to be random or non-random in the first
place?

Let's say you randomly generate the sequence "100101", and then
immediately afterwards, I look at your sequence, and then generate my own
("non-random") sequence: "100101". Does that mean that my sequence is "less
complex" than the random one? Even though they are exactly identical?

I'm not sure what the "basic" fact that one Turing Machine can model
another Turing Machine has to do with software complexity or information
entropy. Seems rather offtopic here.

http://en.wikipedia.org/wiki/Kolmogorov_complexity
The first theorem in the basic results uses the concept of universal
TM.

Yes, but "Kolmogorov complexity" is also offtopic with respect to
"software complexity" as was meant originally in this thread. And I'm not
even sure "on-topic-ness" is a transitive property, so even if "Kolmogorov
complexity" were on-topic, it will still not show that Turing Machines are
on-topic to the fact that lines of codes are a poor measure of software
complexity.

Post by Oliver Wong
What does it mean for an object to be "infinite" in this context, and
what does it mean for an infinite object to be "complex" in this context?

Infinite sequence of 0s and 1s versus finite one.

Again, I'm not sure what sequences of 0s and 1s have to do with Lines Of
Codes.

Maybe you should start a new topic so as to not confuse people as to
what exactly you are talking about.

- Oliver

Mikito Harakiri

2006-01-18 23:40:20 UTC

Post by Oliver Wong
Perhaps I completely misunderstood Patrick May, but it seems he is
talking about the complexity of developing real-world programs, and not of
arbitrary text strings, which is why I said "I suspect this has very little
to do with 'Software Complexity' as Patrick May means it."

Complexity notion is elusive. Emphasising "real world" in "I'm
studying complexity of real-world programs (as opposed to complexity of
TM programs?)" brings nothing new to the table.

Post by Oliver Wong
Let's assume that it's the binary representation for a program for some
particular platform, which implies that it describes an actual valid
the language is the set of all finite strings which are binary
representations for a program in the given platform.
Presumably, there exists some binary strings which are NOT legal
representations of programs, and so binary may not be the optimal (in terms
of size of strings) representation for this program. Also, the platform
might have some peculiarities that lead to all legal representations having
the same prefix (as a header, for example).
We can perform a measure, then, of how much information is present in
this string for that language, and that measure can be called the complexity
of the string (and hence the program). Traditionally, though, this measure
is called the information entropy.

OK. Consider a language {01,0101,010101,01010101,...}
what is the "information entropy" of string 0101 is?

Post by Oliver Wong
Let's say you randomly generate the sequence "100101", and then
immediately afterwards, I look at your sequence, and then generate my own
("non-random") sequence: "100101". Does that mean that my sequence is "less
complex" than the random one? Even though they are exactly identical?

This parragraph doesn't make any sence. Random sequence is defined as
the one that can withstand all possible statistics tests. Random
sequence is considered to be more complex than sequence which is not
random. In fact, random sequence can't be defined with a program which
is shorter than sequence itself. Unfortunately, the definition of
random doesn't apply to finite sequences.

Oliver Wong

2006-01-19 17:19:47 UTC

Complexity notion is elusive. Emphasising "real world" in "I'm
studying complexity of real-world programs (as opposed to complexity of
TM programs?)" brings nothing new to the table.

I'm not sure what you mean by "brings nothing new to the table"; if "the
table" is a metaphor for this discussion, then I am trying to prevent off
topics things from entering the table!

Let's say you take the binary representation of "real-world program A",
and then calculate its Kolmogorov complexity and find out it's 5. Then you
take the binary representation of "real-world program B", calculate its
Kolmogorov complexity and find out it's 7. Does that we needed more skilled
programmers to develop program B than program A? Does it mean that the
source code for program A is easier to understand than that of program B?
What about if one was compiled in such a way that the compiler unrolled
loops and padded variables to align them with highspeed memory access
boundaries, and the other was not?

This is why I say "Kolmogorov complexity" is probably not the same thing
as "Software Complexity" in the sense that it was used earlier in this
thread.

OK. Consider a language {01,0101,010101,01010101,...}
what is the "information entropy" of string 0101 is?

The alphabet of this language is {0,1,e} where e signifies the end of
the string. The entropy of each character in the string is H(x)
= -SUM[i=1..n]{p(i)log2(p(i))}, where i represents each possible character
that may appear next (n is the number of possible characters), and p(i) is
the probability of i occuring next.

Since we have 3 characters in this alphabet, n = 3, and the summation
will always be of the form:

p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))

The first character in this language is always '0', so we have p('0') = 1,
p('1') = 0 and p('e') = 0. The summation gives us a value of 0, and we take
its negative, which is still 0. So the first character has zero
informational entropy. That is, whether or not we see that character, we
still have the same amount of information. In other words, we knew we were
going to see a '0' anyway.

Similarly, the second character is always '1', and so there's still zero
information there.

The third character could either be '0' or 'e'. I don't know what the
probability distribution is, so let's just assume uniform distribution (it
makes the math easier). The summation is:

p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))

=

0.5 * -1
+ 0
+ 0.5 * -1

=

-1

And we take the negation, to give us 1. So the next character, whatever it
is, will provide us with 1 bit of information. Let's say it was '0'. Then we
know the next character must be '1'. After that, it might be '0' or 'e'
again, and provides us with 1 bit of information again.

So I'd say the overall informational entropy of your string is 2 bits.

This parragraph doesn't make any sence. Random sequence is defined as
the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

Post by Mikito Harakiri
Random
sequence is considered to be more complex than sequence which is not
random. In fact, random sequence can't be defined with a program which
is shorter than sequence itself. Unfortunately, the definition of
random doesn't apply to finite sequences.

The definition of a random sequence not being definable with a program
shorter than the sequence itself refers to a Chaitin-Kolmogorov randomness,
and it is not the same as a statistical randomness, so you probably
shouldn't be mixing the two in the above paragraph like that.

The "above paragraph" which did not make sense to you was meant to show
you that I do not understand what you mean by "it is more challenging to
generate a random sequence than a non-random sequence". I still do not know
what you mean by "more challenging". How do you measure challenge levels?

- Oliver

Mikito Harakiri

2006-01-19 20:55:10 UTC

Post by Mikito Harakiri
Consider a language {01,0101,010101,01010101,...}
what is the "information entropy" of string 0101 is?

The alphabet of this language is {0,1,e} where e signifies the end of
the string. The entropy of each character in the string is H(x)
= -SUM[i=1..n]{p(i)log2(p(i))}, where i represents each possible character
that may appear next (n is the number of possible characters), and p(i) is
the probability of i occuring next.
Since we have 3 characters in this alphabet, n = 3, and the summation
p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))
The first character in this language is always '0', so we have p('0') = 1,
p('1') = 0 and p('e') = 0. The summation gives us a value of 0, and we take
its negative, which is still 0. So the first character has zero
informational entropy. That is, whether or not we see that character, we
still have the same amount of information. In other words, we knew we were
going to see a '0' anyway.
Similarly, the second character is always '1', and so there's still zero
information there.
The third character could either be '0' or 'e'. I don't know what the
probability distribution is, so let's just assume uniform distribution (it
p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))
=
0.5 * -1
+ 0
+ 0.5 * -1
=
-1
And we take the negation, to give us 1. So the next character, whatever it
is, will provide us with 1 bit of information. Let's say it was '0'. Then we
know the next character must be '1'. After that, it might be '0' or 'e'
again, and provides us with 1 bit of information again.
So I'd say the overall informational entropy of your string is 2 bits.

OK. Granted you can measure information of a finite string in a
language. This still doesn't help comparing complexity of two finite
strings when we don't know the language.

Given the two Turing machine descriptions, how would I decide which one
is more complex? I proposed to compare them as formal sequences of 1s
and 0s. What is the alternative way to compare TMs you suggest?

Post by Mikito Harakiri
Random sequence is defined as
the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

http://user.it.uu.se/~vorobyov/Courses/KC/2000/l7.ps

Chaitin-Kolmogorov randomness, Statistical randomness, and Martin Loef
defintion of random sequence are the same.

Post by Oliver Wong
The "above paragraph" which did not make sense to you was meant to show
you that I do not understand what you mean by "it is more challenging to
generate a random sequence than a non-random sequence". I still do not know
what you mean by "more challenging". How do you measure challenge levels?

Computable string is not random. Random strings are not computable.
Those are the two polar extremes on complexity scale. It is challenging
to generate random sequence in a sence that it's just impossible to do!
(Of course, this is highly informal, so we can omit this part of
diuscussion altogether).

Dmitry A. Kazakov

2006-01-20 10:31:11 UTC

Post by Mikito Harakiri
OK. Granted you can measure information of a finite string in a
language. This still doesn't help comparing complexity of two finite
strings when we don't know the language.

Sure. They are just incomparable in this case.

Post by Mikito Harakiri
Given the two Turing machine descriptions, how would I decide which one
is more complex?

What's "description of a Turing machine"? Do you mean a program for a
Turing machine here?

Post by Mikito Harakiri
I proposed to compare them as formal sequences of 1s
and 0s. What is the alternative way to compare TMs you suggest?

What about program length?

Though in software developing complexity is not lengths. It is something
like: how much money I need to develop P from scratch, having N ordinary
programmers within time T, with the bug quote Q.

Post by Mikito Harakiri
One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

Mikito, this is rubbish, sorry. Firstly 1/2 is not frequency. The frequency
is a random variable. So 1/2 could only be the expectation value of.
Secondly, even if the expectation isn't 1/2, it is still random. Consider
Pr(1)=0.8, Pr(0)=0.2.

BTW, it would be interesting to scan the memory (or disk) of a set of real
computers to see whether that really is 0.5! (:-))

Computable string is not random. Random strings are not computable

That again depends on the language. If you refer to computability in the
sense of Turing machines then the above is true. But in general case, it
has no meaning.

Post by Mikito Harakiri
Those are the two polar extremes on complexity scale. It is challenging
to generate random sequence in a sence that it's just impossible to do!

Egh, the complexity of a non-existing program isn't infinite. It is
undefined.

Post by Mikito Harakiri
(Of course, this is highly informal, so we can omit this part of
diuscussion altogether).

Let me make a try. To make some sense out of the statement above you should
find a meta-language, where random sequences would be computable as well as
all the sequences of the original language. Within this *new* language you
will have your scale. But, even so, there is no warranty that complexity of
each random sequence will be greater than one of any non-random sequence.
Consider: an x86 machine with an integrated random generator. If complexity
is defined as the program length, then I bet a program looping in reading
the generator's register is far less complex than a Basic compiler....

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Mikito Harakiri

2006-01-20 18:35:07 UTC

Post by Mikito Harakiri
Given the two Turing machine descriptions, how would I decide which one
is more complex?

What's "description of a Turing machine"? Do you mean a program for a
Turing machine here?

I take it back. We can formalize definition of computability, but the
programming system description stays informal.

Post by Mikito Harakiri
I proposed to compare them as formal sequences of 1s
and 0s. What is the alternative way to compare TMs you suggest?

What about program length?

Program length as software complexity metrics has been repeatedly

Post by Dmitry A. Kazakov
Though in software developing complexity is not lengths. It is something
like: how much money I need to develop P from scratch, having N ordinary
programmers within time T, with the bug quote Q.

Maybe. But isn't this kind of metrics ill-defined? How about
reproducibility? You are hired to write a forum software having 5
programmers and spend $1M, while Patrick's team of five spent $10M and
failed to deliver.

Post by Mikito Harakiri
One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

1/2 is the limit frequency, see

http://user.it.uu.se/~vorobyov/Courses/KC/2000/l8.ps

(This is actually better reference than I gave before).

Post by Dmitry A. Kazakov
Let me make a try. To make some sense out of the statement above you should
find a meta-language, where random sequences would be computable as well as
all the sequences of the original language. Within this *new* language you
will have your scale. But, even so, there is no warranty that complexity of
each random sequence will be greater than one of any non-random sequence.
Consider: an x86 machine with an integrated random generator. If complexity
is defined as the program length, then I bet a program looping in reading
the generator's register is far less complex than a Basic compiler....

I admit that I'm more confused now than in the beginning of the thread.
Is measuring program complexity is the same as defining complexity of
finite objects? If it is, then we have a challenging problem defining
complexity for objects as simple as binary strings. This is why I
brought up all this randomness stuff.

In any case, Patricks assertion that software complexity is
1) a mature subject
2) which is worth studying
is far from convincing. Sure if he submits a proposal to stud
"real-world" program complexity, while somebody else submits competing
proposal to study theoretical complexity, he more likely to get a grant
from "real" software development company. This is hardly a proof of
anything.

Oliver Wong

2006-01-20 19:59:46 UTC

Post by Mikito Harakiri
In any case, Patricks assertion that software complexity is
1) a mature subject
2) which is worth studying
is far from convincing. Sure if he submits a proposal to stud
"real-world" program complexity, while somebody else submits competing
proposal to study theoretical complexity, he more likely to get a grant
from "real" software development company. This is hardly a proof of
anything.

The fact that Patrick, in the above fictional scenario, is more likely
to get a grant (as opposed to being equally likely), is evidence that the
"real" software development company can distinguish between "real world"
program complexity and theoretical complexity.

So when you say "Patrick, your ideas about software complexity are all
wrong, and here's my argument showing you why, which includes concepts
borrowed from Turing Completeness and Chaitin-Kolmogorov Randomness", you're
not talking about the same thing Patrick is talking about. Which is why the
ideas might seem all wrong to you.

A perhaps amusing analogy (inspired by events that actually occured to me
recently):

Person A: I make glasses.
Person B: Ah, so you must have read such and such article about optic
refraction and human eye-ball evolution which shows the flaws inherent in
current glasses.
Person A: Uh, no... but I read a paper about maximizing the amount of volume
a certain planar height-map can hold while minimizing surface area.
Person B: You're talking about "glasses" in the sense of things made out of
glass, right?
Person A: Yes.
Person B: Then why would you ever want to maximize volume? The glasses
should be light! You want to minimize the volume so as not to burden the
user!
Person A: Well, yes, I want to minimize the materials used, while maximizing
the volume of the empty space that the glass holds! And besides, the mass of
the glass itself is insignificant compare to what it holds.
Person B: Get the terms straight. Glasses do not "hold" something. A user
perceives through the glasses, and furthermore...
[argument continues, neither parties realizing that Person A means
drinking-glasses and Person B means eye-glasses]

So I think we should forget about trying to reconcile this new topic of
complexity (Kolmogorov complexity) with the old topic ("real world" software
complexity). While related (just like eye-glasses and drinking-glasses are
both made of glass), there isn't much to gain by comparing the two.

I'd rather just go ahead and discuss interpretations of
Chaitin-Kolmogorov randomness and closely related topics.

- Oliver

Dmitry A. Kazakov

2006-01-21 11:39:32 UTC

No problem, complexity would be random because programmers are considered
to be selected at random. Though it gets tricky when there is one fixed
programmer. So I presume that uncertainty in metrics estimation contains a
heavy fuzzy component, additionally to the random one/

Post by Mikito Harakiri
One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

1/2 is the limit frequency, see
http://user.it.uu.se/~vorobyov/Courses/KC/2000/l8.ps
(This is actually better reference than I gave before).

OK, that is an alternative [constructive] probability (or should I say
chances) theory. It should have difficulties with defining real-valued
random variables. Whether it is equivalent to the standard theory, I cannot
tell. Constructive mathematics often brings surprises. However, what about
modified "shooting paradox": you shot at a flat target at random. The
target is a square 1m x 1m. The coordinates of a hit are random. So random
are their binary representations. True? No! According to Martin-Löf
(0.5,0.5) isn't a valid hit, because 0.10000000 is not a random sequence!

Anyway, having a prefix of an infinite sequence you can say *nothing* about
its limit (if any.) Fundamentally, in either constructive or not
probability theory there cannot be any randomness test. The difference
could be that a constructive theory might accept non-constructive tests,
but that won't help you...

I admit that I'm more confused now than in the beginning of the thread.
Is measuring program complexity is the same as defining complexity of
finite objects?

If objects are programs.

Post by Mikito Harakiri
If it is, then we have a challenging problem defining
complexity for objects as simple as binary strings.

Only if binary strings are programs. That's the whole point. Complexity
isn't a property of the object. It is of the object *and* the language.

Post by Mikito Harakiri
This is why I
brought up all this randomness stuff.
In any case, Patricks assertion that software complexity is
1) a mature subject

Would be "a mature empirical subject" OK?

Post by Mikito Harakiri
2) which is worth studying

Sure, how otherwise could we make software developing an engineering some
day?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Oliver Wong

2006-01-20 16:24:06 UTC

Post by Mikito Harakiri
Random sequence is defined as
the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.
http://user.it.uu.se/~vorobyov/Courses/KC/2000/l7.ps

So all non-uniform random distributions are "not random" according to
this test. For example, Normal distribution, Poisson Distribution, Gaussian
distribution, etc. would all fail this test. I'm not sure that this is a
good definition for "Random sequence".

I propose that, in a formal context, it doesn't make sense to use the
qualifier "Random" to describe a sequence. When someone says "This is a
random sequence", what I think they *really* mean, is that this sequence was
generated by a random process. Note that the binary sequence "0110101"
*might* have been generated by a random process, or it might have been
generated by a fixed, deterministic process. It doesn't make sense to say
that the sequence itself is "random" (in the statistical sense of the word
"random", not the "Chaitin-Kolmogorov" sense).

Chaitin-Kolmogorov randomness, Statistical randomness, and Martin Loef
defintion of random sequence are the same.

Consider this simple program:

FOR %I = 1 TO 40
PRINT '1'
END FOR

It's 37 characters long, including whitespace. It prints an output of
size 40 characters. So according to Chaitin-Kolmogorov, the string that it
outputs is by definition NOT random.

But ask any statistician, "Is it possible that the value of a randomly
generated binary string be the digit '1' repeated 40 times?" and they will
surely answer "yes". So according to Kolmogorov, '1' 40 times is not random,
while according to statisticians, '1' 40 times could indeed have been
generated by a random process. This is why I say the definitions differ.

Ah yes, okay, again I was unfortunately thinking in the context of the
"old" thread, where I could pay someone $20 and tell him "Generate a random
sequence for me", and he wouldn't need a PhD degree in CompSci to do it. So
in that sense, it wasn't "complex".

Generating a random output given a fixed input is indeed impossible in
the Turing model of computation, but note that our computers, strictly
speaking, aren't using a Turing model of computation. Because we have finite
RAM and diskspace, our computers are more like DFAs than TMs. How is it that
our computers are able to generate (pseudo-)random sequences? They read from
an input which IS (pseudo-)random. This input might be based on the current
time, the hard-disk seek time, a radioactive decay process, etc.

When our computers are given this input, it's relatively trivial (in the
sense of O(n)) to generate a length n random sequence. So to let a program
on a TM produce random output, just make some (or all) of the input random.

- Oliver

Mikito Harakiri

2006-01-20 19:19:05 UTC

Post by Mikito Harakiri
http://user.it.uu.se/~vorobyov/Courses/KC/2000/l7.ps

Again,

http://user.it.uu.se/~vorobyov/Courses/KC/2000/l8.ps

is a better reference. (Interestingly, lecture notes go backwards in
chronlogical order). Von Mises ideas are more intuitive, than Martin
Loef definition.

Post by Oliver Wong
I propose that, in a formal context, it doesn't make sense to use the
qualifier "Random" to describe a sequence. When someone says "This is a
random sequence", what I think they *really* mean, is that this sequence was
generated by a random process. Note that the binary sequence "0110101"
*might* have been generated by a random process, or it might have been
generated by a fixed, deterministic process. It doesn't make sense to say
that the sequence itself is "random" (in the statistical sense of the word
"random", not the "Chaitin-Kolmogorov" sense).

Once again, the critical point is that you can define *infinite* binary
sequence as random, but not the finite one.

I can't comment on algoritmic versus statistics point of view.

Post by Mikito Harakiri
Chaitin-Kolmogorov randomness, Statistical randomness, and Martin Loef
defintion of random sequence are the same.

FOR %I = 1 TO 40
PRINT '1'
END FOR
It's 37 characters long, including whitespace. It prints an output of
size 40 characters. So according to Chaitin-Kolmogorov, the string that it
outputs is by definition NOT random.

According to Chaitin-Kolmogorov you measure the complexity up to a
constant. Kolmogorov complexity of your program is:

37 + constant

Some authors even assert it as "small" constant. This is rubbish of
course, if you can't tell the boundaries to the value, it can be
arbitrarily large. It is small in asymptotic sense, but we are not
talking asymptotics when speaking of real-world programs. Or do we?

Post by Oliver Wong
But ask any statistician, "Is it possible that the value of a randomly
generated binary string be the digit '1' repeated 40 times?" and they will
surely answer "yes". So according to Kolmogorov, '1' 40 times is not random,
while according to statisticians, '1' 40 times could indeed have been
generated by a random process. This is why I say the definitions differ.

I can't comment on statistics point of view. There is no way to apply
statistics methods to an individual binary sequence, being it finite or
infinite.

Post by Oliver Wong
Ah yes, okay, again I was unfortunately thinking in the context of the
"old" thread, where I could pay someone $20 and tell him "Generate a random
sequence for me", and he wouldn't need a PhD degree in CompSci to do it. So
in that sense, it wasn't "complex".
Generating a random output given a fixed input is indeed impossible in
the Turing model of computation, but note that our computers, strictly
speaking, aren't using a Turing model of computation. Because we have finite
RAM and diskspace, our computers are more like DFAs than TMs. How is it that
our computers are able to generate (pseudo-)random sequences? They read from
an input which IS (pseudo-)random. This input might be based on the current
time, the hard-disk seek time, a radioactive decay process, etc.

Define what is pseudo-random sequence please.

Oliver Wong

2006-01-20 19:41:14 UTC

Post by Mikito Harakiri
http://user.it.uu.se/~vorobyov/Courses/KC/2000/l7.ps

Again,
http://user.it.uu.se/~vorobyov/Courses/KC/2000/l8.ps
is a better reference. (Interestingly, lecture notes go backwards in
chronlogical order). Von Mises ideas are more intuitive, than Martin
Loef definition.

Okay, but this just shows that "Loef-Random" and "(Statistics-)Random"
are two completely different concepts. That's fine; it's just that I think a
lot of the disagrement we had came from a misunderstanding of terms. I had
assumed you meant "Random" in the "statistics and probabilities" sense, but
it turns out you were speaking of "Chaitin-Kolmogorov randomness", and then
later on, "Loef randomness" (I'm not sure if CK Random and L Random are
equivalent or not).

So from this point in the thread forward, we should probably specify
which "random" we mean. (In this post, I'm using CK to mean
Chaitin-Kolomogorov, L to mean Loef, and S&P to mean Statistics and
Probabilities)

Once again, the critical point is that you can define *infinite* binary
sequence as random, but not the finite one.

This might be true of CK randomness and L randomness, but a finite
sequence can easily be generated by an S&P-Random process: Flip a coin once,
and if it's head, record '1'; else record '0'. This is a finite sequence
(length 1), and it was generated (S&P-)randomly.

I believe a finite binary sequence can also be CK random, because there
exists (finite) binary sequences, which are shorter than the computer
programs which can generate them.

As for L-Randomness, I don't know. I only briefly looked at the slide
you presented, so I haven't yet a firm grasp of what L-Randomness entails.

[snip]

Post by Mikito Harakiri
Chaitin-Kolmogorov randomness, Statistical randomness, and Martin Loef
defintion of random sequence are the same.

According to Chaitin-Kolmogorov you measure the complexity up to a
37 + constant
Some authors even assert it as "small" constant. This is rubbish of
course, if you can't tell the boundaries to the value, it can be
arbitrarily large. It is small in asymptotic sense, but we are not
talking asymptotics when speaking of real-world programs. Or do we?

I'm going to sidestep all of these issues by pointing out that the
definition of a CK-Random string is as follows:

"a string is Chaitin-Kolmogorov random if and only if it is shorter than any
computer program that can produce that string."

The length of that string is 40 characters. I have presented a computer
program that can produce that string, and that computer program is 37
characters. Thus the string is, by the above definition, not (CK-)random.

[snip]

Post by Oliver Wong
Generating a random output given a fixed input is indeed impossible in
the Turing model of computation, but note that our computers, strictly
speaking, aren't using a Turing model of computation. Because we have finite
RAM and diskspace, our computers are more like DFAs than TMs. How is it that
our computers are able to generate (pseudo-)random sequences? They read from
an input which IS (pseudo-)random. This input might be based on the current
time, the hard-disk seek time, a radioactive decay process, etc.

Define what is pseudo-random sequence please.

I don't have a formal definition ready (perhaps some crypto-analyst
reading this thread could chime in?), but we say that the usually when you
generate a "random" number on a computer, it is actually a pseudo-random
number, because the process by which it is generated is deterministic, and
depends only on the seed, and the state of the generator.

From Wikipedia:

<quote>
The outputs of most pseudorandom number generators are not truly random-they
only approximate some of the properties of random numbers.
[...]
Because any PRNG run on a deterministic computer (contrast quantum computer)
is a deterministic algorithm, its output will inevitably have one property
that a true random sequence would not exhibit: guaranteed periodicity. It is
certain that if the generator uses only a fixed amount of memory then, given
a sufficient number of iterations, the generator will revisit a previous
internal state, after which it will repeat forever. A generator that isn't
periodic can be designed, but its memory requirements would grow as it ran.
In addition, a PRNG can be started from an arbitrary starting point, or seed
state, and will always produce an identical sequence from that point on. The
practical significance of this periodicity is limited. The length of the
maximum period doubles with each bit of added memory. It is easy to build
PRNGs with periods so long that no computer could complete one cycle in the
expected lifetime of the universe. It is an open question, and one central
to cryptography, whether there is any way to distinguish the output of a
well-designed PRNG from perfect random noise without knowing its seed.
</quote>

- Oliver

Daniel Parker

2006-01-20 17:01:00 UTC

Post by Mikito Harakiri
Random sequence is defined as
the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

I'm not sure that I fully understand this, but let's say that we have a
uniform random process that results in the sequence

11111111111111111111111111111111111111111111111111111111111111

This becomes a realization, the realized sequence is no longer random.
This particular realization is no less likely to occur than this one

00101010101000101010101010101010101010111010111010101010101010

They both have the same probability of occuring. It's not meaningful
to say that one is more random than the other.

Regards,
Daniel Parker

Mikito Harakiri

2006-01-20 18:59:16 UTC

Post by Mikito Harakiri
Random sequence is defined as
the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

One such test would be measuring frequency of 1s and 0s. If we get
frequency other that 1/2, then the sequence is not random.

I'm not sure that I fully understand this, but let's say that we have a
uniform random process that results in the sequence
11111111111111111111111111111111111111111111111111111111111111
This becomes a realization, the realized sequence is no longer random.
This particular realization is no less likely to occur than this one
00101010101000101010101010101010101010111010111010101010101010
They both have the same probability of occuring. It's not meaningful
to say that one is more random than the other.

Your observation is correct. In fact, many papers on this topic (which
I fail to locate on the web) begin with something like this:

You gamble by tossing a coin and bet on heads (assume 1s). Your
opponent bets on 0s. He tosses a coin repeatedly and the output is

0000000000000000

You accuse him of cheating (implying that the coin is defect, or that
he has a sophisticated way how to throw a coin, etc). What is the basis
for your accusation? Well, the probability of this outcome is
2^(-length). But so is the probability of any other outcome of the same
lengh! E.g.

1001001100111001

This is why all this unsatisfaction with "standard" probability theory
as applied measure theory. Von Mises asked a question if randomness can
be defined in terms of frequences

http://user.it.uu.se/~vorobyov/Courses/KC/2000/l8.ps

and this was actually the beginning of the theory evolved through the
chain of contributions by Kolmogorov, Martin Loef, and then Chaitin and
others.

Patrick May

2006-01-19 21:28:54 UTC

Post by topmind
Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

I suggest you Google for "software complexity" and you'll
find several million links. Starting from a page like
http://yunus.hun.edu.tr/~sencer/complexity.html will give you
pointers to other research if you are genuinely interested in
learning.

Hmm. I tried to find "software complexity" in wikipedia and failed.
Apparently this topic (and the link you supplied) is a typical
example of junk science.

Apparently your research skills are on a par with your logic.
The point I was making is that there are objective measures of
software complexity. The most appropriate metrics vary depending on
the domain, environment, and other factors, but the metrics themselves
are objective.

This is an area of active research, as you would know had you
done your homework before making spurious claims about "junk science."

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Mikito Harakiri

2006-01-19 22:23:09 UTC

Post by topmind
Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

I suggest you Google for "software complexity" and you'll
find several million links. Starting from a page like
http://yunus.hun.edu.tr/~sencer/complexity.html will give you
pointers to other research if you are genuinely interested in
learning.

Hmm. I tried to find "software complexity" in wikipedia and failed.
Apparently this topic (and the link you supplied) is a typical
example of junk science.

So if you can compare complexity of software, then perhaps you may
suggest an objective criteria how to compare complexity of one TM vesus
another? Isn't a problem formulated in terms of simple execution model
(that is TM) supposed to be simpler than the same problem formulated in
terms of programming languages (which are legacy artifacts)?

Post by Patrick May
This is an area of active research, as you would know had you
done your homework before making spurious claims about "junk science."

I made this opinion from the web reference that you supplied. Naive
concepts everywhere ("Fan-In Fan-Out complexity"). No attempt to
generalize. You can't just approach program complexity with accountant
methods, you need some insights.

Oliver Wong

2006-01-20 16:29:57 UTC

Post by topmind
Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

I suggest you Google for "software complexity" and you'll
find several million links. Starting from a page like
http://yunus.hun.edu.tr/~sencer/complexity.html will give you
pointers to other research if you are genuinely interested in
learning.

Hmm. I tried to find "software complexity" in wikipedia and failed.
Apparently this topic (and the link you supplied) is a typical
example of junk science.

Mikito, this is another example of what I was talking about when I said
what you're thinking when you see the word "software complexity" and what
Patrick is thinking when he sees the word "software complexity" obviously
differs.

I called Patrick's semantics "real world" and yours "theoretical"
because at the end of the day, a project lead working in a software
development company for money is much more likely to be interested in
Patrick's metrics than yours.

Hell, even "Lines of Codes" is probably going to be a more interesting
figure to him than the "Chaitin-Kolmogorov complexity of the binary
representation of the program compiled for an Win32 x86 platform".

I believe the "disagreement" between you two stems from this variance in
definition of the terms you're discussing.

- Oliver

Patrick May

2006-01-20 20:27:10 UTC

Post by Oliver Wong
I called Patrick's semantics "real world" and yours
"theoretical" because at the end of the day, a project lead working
in a software development company for money is much more likely to
be interested in Patrick's metrics than yours.

Very well put. I think you've identified the core issue.

Regards,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Patrick May

2006-01-20 20:25:07 UTC

Post by Patrick May
The point I was making is that there are objective measures of
software complexity. The most appropriate metrics vary depending
on the domain, environment, and other factors, but the metrics
themselves are objective.

So if you can compare complexity of software, then perhaps you may
suggest an objective criteria how to compare complexity of one TM
vesus another? Isn't a problem formulated in terms of simple
execution model (that is TM) supposed to be simpler than the same
problem formulated in terms of programming languages (which are
legacy artifacts)?

First, I'd like to apologize for leaping to flame you in my last
post. It wasn't deserved or appropriate.

You ask an interesting question, but not one that is really
applicable to the point I was trying to make. The fact is that
objective measures of software complexity do exist and additional
metrics are under active development. None I know of are based on
Turing Machines, but I suppose it is possible (although converting
software written in Perl, for example, to a Turing Machine program in
order to apply the metric would be . . . challenging).

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

topmind

2006-01-17 03:01:01 UTC

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts
the RDB implementation. However, here I was referring to
problem space abstraction.

This depends on how one defines "problem space abstraction".

"Problem space abstraction" typically refers to a concept
from the domain for which the software system is being developed,
e.g. Customer, Sales Order, Network Element, Product, etc. This
is distinguished from "solution space abstractions" such as
tables, rows, columns, keys, pointers, functions, and so on. This
isn't a point of contention among experienced software developers.

But how are tables less close to the domain than classes, methods,
and attributes?

The ability to model behavior as well as data makes general
purpose languages better able to model the problem domain than is
SQL.

If you design right, you can *shift* much behavior to being data and DB
operations instead. SQL is close to being Turing Complete. Thus, it
only needs minor help from procedural code to implement any algorithm.
(I don't propose we do make it or other relational languages TC, by the
way.) Also, whether this is practical or not is another issue. Some
things are best done by the DB and some best done in app code. It is 2
paradigms that fit together like Yin and Yang.

Plus, OO is usually crappy at modeling behavior, at least in the biz
domain. OO is only nice when things split up into nice hierarchical
taxonomies. Most things don't in reality, so what is left is a mess.

Things like the Visitor Pattern have objectively more repetition than
multi-dispatching in tables. This is not an opinion but a fact.

Post by topmind
(I agree that some of the behavior side is not something to be done
on the DB, but that is simply a partitioning of specialties issue.)

No, it's a qualitative difference.

What is wrong with partitioning that way? I agree it makes a few things
harder, but most things simpler because things that are good at data
are not good at behavior and visa versa. Attribute management via
relational is fairly clean and consistent. Attribute management via OO
is a goto-like speghetti mess with every designer doing it differently,
modeling imaginary sh8t in their diverse heads.

Post by topmind
Repetative SET/GET syndrome is an example of this poor pattern
factoring.

Proliferation of get/set methods is a code smell. Immutable
objects are to be preferred.

This is not the censuses in the OO community.

Almost all of these have a fair amount of disagreement among OO
proponents. Check out c2.com.

It is statements like this that strongly suggest that you have
never developed a large, complex system.

No, because I break them up into peices so that they don't grow to be
one big fat EXE. The Big-EXE methodology has a high failure rate.

Post by Patrick May
The vast majority of
businesses that need systems of this complexity have legacy software
consisting of a number of COTS applications and custom components,
none of which were designed to work with each other. These have been
selected or developed for good business reasons and cannot be
aggregated and run on a single piece of kit, no matter how large.

Agreed, but what does the existence of legacy apps have to do with your
scaling claim? Integration with legacy apps is just something we all
have to face.

Post by Patrick May
Even if it were possible to go down the mainframe route, in many
cases it would not make business sense.

Mainframe route?

Post by Patrick May
Big iron is expensive to buy,
maintain, and upgrade. Distributed systems running on relatively
inexpensive hardware can provide a more cost-effective solution.

Centralized server shops use the same hardware as decentralized ones
these days. It is a matter of where the servers are put, not how big
the individual boxes are or even how many there are.

I did not mean to suggest that everything should be centralized. It
depends on the kind of business. If the vast majority of
operations/tasks needed are per location, then regional partitioning
works well. If not, then a more centralized approach is needed. For
example, airline reservation and scheduling systems would make a poor
candidate to partition by location because of the interconnectedness of
flights. However, individual stores in a big franchise can operate most
independently.

Post by topmind
"Security" is mostly just massive ACL tables.

I see no mention of ACL's there. If you have a specific case of ACL's
crashing and burning, post it here and let's take a look at it. (Note
that there are a lot of variations of ACL's, so a flaw in one kind is
not necessarily a general flaw in ACP concepts.)

Again, use evidence instead of patronizing insults. It is a bad habit
of yours.

Post by topmind
You are participating in Domain Bigotry here.

No, he is simply stating the obvious: CRUD/USER applications
are not particularly complex, especially when compared with other
software systems.

Again, I have yet to see an objective or even semi-objective way to
measure "complexity".

That site is down right now. They must be using Kerberos :-)

Post by topmind
Again, the basic concepts of CRUD are easier to learn than most
other domains or problems, but that does not mean that the
implementation and maintainence of related apps is simple.

CRUD applications are, however, not particularly complex as
software systems go. Your claims otherwise indicate a lack of
experience with anything else.

Again, please use evidence to prove me wrong instead of patronizing
insults. It is a bad habit of yours.

Propose a way to measure "complexity", and then apply it to CRUD apps.
That is how you make a point. Anecdotes and private personal opinions
mean very little here. They are a dime a dozen.

Post by topmind
For lack of a better metric, I propose lines of code (LOC) for now
as our metric for complexity.

This is yet another suggestion that shows you don't know much
about the topic you're discussing. Lines of code is not a good
measurement of anything. Do some research.

Present some evidence. It is not my job to make YOUR points for you. If
your evidence is "over there", then go "over there", grab it, and bring
it back.

Post by topmind
Can you make an argument for something being lots of LOC without
being "complex" (beyond learning the domain)? If not, then you are
probably stuck using LOC here.

Well, these are rather non-specific words you are using. Perhaps such
tasks bore you, but bordem as a measure of complexity is fraught with
problems.

Post by Patrick May
Object technology is not immune. J2EE in general, and EJBs in
particular, require a great deal of code to provide functionality that
could be provided far more efficiently.
On the other hand, there are some delightfully complex software
systems that consist of only a few hundred lines of code. Functional
languages seem especially good for this. See one of Peter Norvig's
books for a few examples.

Most FP demonstrations of such are "toy" or "lab" examples. The
concepts that keep them simple in a lab environment don't seem to scale
to the real world where more factors and messy little conditions are
involved. I have asked for FP demonstrations of practical code
simplification, and FP fans could not do it. They flunked the
challenge. This is why FP has not made much headway into the practical
world. Outside of lab toys it is not demonstratably better.

However, FP is wondering off topic. Let's try to stay on track.

Post by Patrick May
Lines of code is a useless metric.

Limited perhaps, but not useless. Especially since we have *nothing
else* besides your boredom as a ruler so far.

Perhaps you are a computer genius, but you seem to be having trouble
turning your alleged genius into words.

If you are going to use "complexity" as a metric to make your case, you
better find ways to measure it objectively. Otherwise it is just an
opinionfest. Read up on the scientific process.

Post by Patrick May
Sincerely,
Patrick

-T-

Oliver Wong

2006-01-18 23:01:15 UTC

Not going to address every brought up in this thread; just a few points
here and there...

Post by topmind
"Security" is mostly just massive ACL tables.

I'm assuming by "ACL", you mean "Access Control Lists". ACL is one way
to solve the problem of "authorization". That is, given that you know this
person "A" belongs to group "B", does that person have the right to access
resource "C"?

The central topic of that page on Kerberos seems to be "authentication";
that is, how to find out that the person speaking to you is person "A" in
the first place. "authentication" and "authorization" are occasionally
confused, but they are well defined, distinct topics in security. As far as
my understand of ACL goes, it does not address authentication at all.

There are other topics in computer security too, that ACL doesn't
address. E.g. public key cryptography, webs of trust, encryptiong, hashing,
non-repuditation, anonymity, pseudo-random number generation, etc. I would
argue that all of these topics are not "sub-topics" of ACL, so it would seem
that there's a lot more to security than just massive ACL tables.

Post by topmind
For lack of a better metric, I propose lines of code (LOC) for now
as our metric for complexity.

This is yet another suggestion that shows you don't know much
about the topic you're discussing. Lines of code is not a good
measurement of anything. Do some research.

Present some evidence. It is not my job to make YOUR points for you. If
your evidence is "over there", then go "over there", grab it, and bring
it back.

Rather than "I'm right and you're wrong", it might be better to focus on
"This statement is true, and that one is false". I think it's generally
agreed that LOC is a poor metric for software complexity. If Patrick May
wishes to convince topmind of this point, he may cite some documents which
support his claim. If he doesn't particular care to convince topmind, then
he won't bother to post documents. Either way, I think Patrick May is
confident of his claim. Similarly, if topmind wishes to convince Patrick
May, (s)he too could post some documents. Or maybe (s)he won't bother.
topmind sounds pretty confident too.

I suspect most people who are reading this thread are fairly confident
about their opinion on LOC too, but if they are reasonable people, would be
willing to consider evidence which contradicts their opinions. If such
people haven't yet formed an educated opinion, those people may wish to
start their investigations at
http://en.wikipedia.org/wiki/Source_lines_of_code

Post by topmind
Can you make an argument for something being lots of LOC without
being "complex" (beyond learning the domain)? If not, then you are
probably stuck using LOC here.

Well, these are rather non-specific words you are using. Perhaps such
tasks bore you, but bordem as a measure of complexity is fraught with
problems.

I don't think Patrick May is proposing using boredom as a measure of
complexity here. Rather, he is implicitly answering your question of "Can
you make an argument for something being lots of LOC without being complex?"
The answer is yes, and the argument is of the form of a counter-example.
Here is something which is lots of LOC, without being complex.

- Oliver

topmind

2006-01-19 02:43:50 UTC

Post by Oliver Wong
Not going to address every brought up in this thread; just a few points
here and there...

Post by topmind
"Security" is mostly just massive ACL tables.

That is more of a network issue than an application issue it appears.
(LDAP is a kind of special-purpose network database actually IIRC, I
would note.)

Post by Oliver Wong
There are other topics in computer security too, that ACL doesn't
address. E.g. public key cryptography, webs of trust, encryptiong, hashing,
non-repuditation, anonymity, pseudo-random number generation, etc. I would
argue that all of these topics are not "sub-topics" of ACL, so it would seem
that there's a lot more to security than just massive ACL tables.

P. May was not clear. I am generally focusing on applications, not
network implementation. If he wants to argue that RDBMS are no good for
systems software, I probably will not challenge it at this time.
Networks and systems software tend to have to stick to standard
protocols and be optimized for resources. On the other hand, biz apps
live in constantly changing requirements (changing protocols) and
machine resources are generally more plentiful (relative to SS).
Perhaps this is where RDBMS belong: nimble but hardware-hogging.

Post by topmind
For lack of a better metric, I propose lines of code (LOC) for now
as our metric for complexity.

This is yet another suggestion that shows you don't know much
about the topic you're discussing. Lines of code is not a good
measurement of anything. Do some research.

Present some evidence. It is not my job to make YOUR points for you. If
your evidence is "over there", then go "over there", grab it, and bring
it back.

I never claimed that code bulk (such as LOC) is a great metric for
"complexity". I am only saying it is the only one we have to work with
until another better one is suggested.

Note that it is easier to pad code to make it longer if it is being
used as incentive tool, but it is more difficult to make it shorter. As
a writer will tell you, breivity can be tricky. However, sometimes
short code is tough for many to read, and measuring readability is
another black art, putting us back to square zero.

Post by Oliver Wong
I suspect most people who are reading this thread are fairly confident
about their opinion on LOC too, but if they are reasonable people, would be
willing to consider evidence which contradicts their opinions. If such
people haven't yet formed an educated opinion, those people may wish to
start their investigations at
http://en.wikipedia.org/wiki/Source_lines_of_code

Post by topmind
Can you make an argument for something being lots of LOC without
being "complex" (beyond learning the domain)? If not, then you are
probably stuck using LOC here.

Well, these are rather non-specific words you are using. Perhaps such
tasks bore you, but bordem as a measure of complexity is fraught with
problems.

But unless he can justify his "less complex" claim with something other
than anecdotes, it won't get us anywhere.

I've seen code that I thought was complex until I understood it. Rocket
science code may look simple to a rocket scientist (domain expert), for
example. CRUD apps involve domains that usually require less formal
education to pick up, so perhaps that just creates the illusion of
simplicity.

I'll propose another rough metric of complexity: Automatability. If
something is easy to automate, then it is less complex. Dispite its
appearence, CRUD has resisted automation beyond a certain point. It is
fraught with the "80/20" rule where you need to be able to put
exceptions (variances, not errors) to whatever the framework provides.
For example, maybe 80% of all screen fields map one-to-one to DB
columns. However, the framework must deal with the 20% that don't
directly map. Thus, you cannot directly tie the schema to the screen
without a back-door. Perhaps you can use the schema as a starting
point, but the screen will need custom fiddling in the end.

So, now we have 2 rough candidate metrics for complexity:

* Code bulk (such as Lines-of-Code)
* Automatability

Post by Oliver Wong
- Oliver

-T-

Patrick May

2006-01-19 22:11:37 UTC

Post by Patrick May
The ability to model behavior as well as data makes general
purpose languages better able to model the problem domain than is
SQL.

If you design right, you can *shift* much behavior to being data and
DB operations instead.

Depending on the requirements, some functionality can be
implemented using set operations, certainly. "Much" is pushing it,
especially when one limits oneself to non-gratuitous use of those
operations.

Post by topmind
SQL is close to being Turing Complete.

In other words, SQL is not Turing complete. That addresses your

Post by topmind
But how are tables less close to the domain than classes,
methods, and attributes?

We're done with that one.

Post by topmind
Plus, OO is usually crappy at modeling behavior, at least in the biz
domain. OO is only nice when things split up into nice hierarchical
taxonomies. Most things don't in reality, so what is left is a mess.

You've been challenged on this assertion in the past and failed
to defend it. The history is available via Google for anyone to see.
Unless you've got more to back up your nonsense than you did before,
repeating this is intellectually dishonest.

Post by Patrick May
Proliferation of get/set methods is a code smell.
Immutable objects are to be preferred.

This is not the censuses in the OO community.

Yes, it is. Josh Bloch recommends immutability explicity in
"Effective Java" and gives solid reasons for his position.
Proliferation of getters and setters violates encapsulation, one
of the defining characteristics of object technology. Some
research will show you that OO designs focus on behavior, not
state. You should also check out the Law of Demeter and similar
guidelines that provide further evidence that excessive use of
accessors and mutators is not good OO form.

Almost all of these have a fair amount of disagreement among OO
proponents. Check out c2.com.

Interesting. I provide explicit examples of what are generally
accepted as good OO principles and practices and you refer to a random
website. If you have real documentation of getter/setter
proliferation being an accepted OO technique, produce it.

Post by topmind
I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can be
obtained by purchasing a "big-iron" RDBMS such as Oracle or
DB2. The configuration and management of those issues is then
almost a commodity skill and not as tied to the domain as a
roll-your-own solution would be (which OO'ers tend to do).

It is statements like this that strongly suggest that you
have never developed a large, complex system.

No, because I break them up into peices so that they don't grow to
be one big fat EXE. The Big-EXE methodology has a high failure rate.

Modularity is not exclusive to imperative programming. It is
also not the silver bullet that slays the complexity lycanthrope.

Post by Patrick May
The vast majority of businesses that need systems of this
complexity have legacy software consisting of a number of COTS
applications and custom components, none of which were designed to
work with each other. These have been selected or developed for
good business reasons and cannot be aggregated and run on a single
piece of kit, no matter how large.

Agreed, but what does the existence of legacy apps have to do with
your scaling claim?

The existence of legacy systems is just one reason why your
suggestion of using '"big-iron" RDBMS such as Oracle or DB2' cannot
solve the complex problems of large organizations.

Post by topmind
I did not mean to suggest that everything should be centralized.

That is what you suggested above.

Post by topmind
It depends on the kind of business. If the vast majority of
operations/tasks needed are per location, then regional partitioning
works well. If not, then a more centralized approach is needed. For
example, airline reservation and scheduling systems would make a
poor candidate to partition by location because of the
interconnectedness of flights. However, individual stores in a big
franchise can operate most independently.

Your lack of experience with large, complex systems is showing,
again. Basically you're suggesting one or more monolithic processing
hubs -- the simple CRUD/USER stuff you're used to writ large. Are
those the only kinds of problems you're used to?

Post by topmind
"Security" is mostly just massive ACL tables.

That is profoundly . . . naive. I strongly urge you to read
everything you can find by Bruce Schneir, join the cryptography
mailing list run by Perry Metzger, and not say another word about
security until you understand why your statement is so deeply
embarrassing to you. For a quick, very small taste of why ACL
tables don't even begin to scratch the surface of the problem,
read http://www.isi.edu/gost/brian/security/kerberos.html.

I see no mention of ACL's there.

That's my point.

Post by topmind
If you have a specific case of ACL's crashing and burning, post it
here and let's take a look at it. (Note that there are a lot of
variations of ACL's, so a flaw in one kind is not necessarily a
general flaw in ACP concepts.)

I never claimed that ACL's crash and burn. I said that ACLs
barely scratch the surface of the security requirements of a large
distributed system. Clearly you don't have any experience with such.

Post by topmind
Again, use evidence instead of patronizing insults. It is a bad
habit of yours.

I can see how my habits of calling bullshit when I smell it and
not suffering fools gladly would be considered "bad" by someone with
your proclivities. That doesn't change the fact that your claim that
"massive ACL tables" address the security requirements of large
distributed systems is ridiculous on its face. Patronization is the
best you can expect when you spew nonsense like that.

Post by Patrick May
CRUD applications are, however, not particularly complex as
software systems go. Your claims otherwise indicate a lack of
experience with anything else.

Again, please use evidence to prove me wrong instead of patronizing
insults. It is a bad habit of yours.

How is that patronizing? It's a simple statement of fact. There
is a reason why the CRUD work is typically given to new hires and
junior developers.

Post by topmind
Propose a way to measure "complexity", and then apply it to CRUD
apps. That is how you make a point. Anecdotes and private personal
opinions mean very little here. They are a dime a dozen.

If you're seriously suggesting that CRUD applications are equal
in complexity to compilers, telco OSS/BSS, MRP/ERP, or risk analytics,
just to pull a few examples off the top of my head, then that reflects
more on your experience than on the veracity of your claim.

Post by Patrick May
On the other hand, there are some delightfully complex
software systems that consist of only a few hundred lines of code.
Functional languages seem especially good for this. See one of
Peter Norvig's books for a few examples.

Most FP demonstrations of such are "toy" or "lab" examples.

Dismissing out of hand systems of which you know nothing. That's
a bad habit of yours.

I'd be tempted to ascribe your apparent need to continue
discussions ad nauseam to some form of obsessive-compulsive disorder,
but I don't have a background in psychology so I won't. You should
try that not-talking-about-things-you-know-nothing-about approach
sometime.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

topmind

2006-01-21 07:52:53 UTC

Post by Patrick May
The ability to model behavior as well as data makes general
purpose languages better able to model the problem domain than is
SQL.

If you design right, you can *shift* much behavior to being data and
DB operations instead.

I stand by that claim. However, as a warning in the end it turns into a
Laynes Law mess tied to the definition of "data" versus "code" and/or
behavior.

Post by topmind
SQL is close to being Turing Complete.

In other words, SQL is not Turing complete. That addresses your

No, because I never claimed "all".

Post by topmind
But how are tables less close to the domain than classes,
methods, and attributes?

We're done with that one.

My crappometer crapped out. The burden is not on me to prove that OO is
better. It is on you. The default is equal or unknown.

Post by Patrick May
Proliferation of get/set methods is a code smell.
Immutable objects are to be preferred.

This is not the censuses in the OO community.

Yes, it is. Josh Bloch recommends immutability explicity in
"Effective Java" and gives solid reasons for his position.
Proliferation of getters and setters violates encapsulation, one
of the defining characteristics of object technology. Some
research will show you that OO designs focus on behavior, not
state. You should also check out the Law of Demeter and similar
guidelines that provide further evidence that excessive use of
accessors and mutators is not good OO form.

Almost all of these have a fair amount of disagreement among OO
proponents. Check out c2.com.

Do you have the opposite?

Post by topmind
I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can be
obtained by purchasing a "big-iron" RDBMS such as Oracle or
DB2. The configuration and management of those issues is then
almost a commodity skill and not as tied to the domain as a
roll-your-own solution would be (which OO'ers tend to do).

It is statements like this that strongly suggest that you
have never developed a large, complex system.

No, because I break them up into peices so that they don't grow to
be one big fat EXE. The Big-EXE methodology has a high failure rate.

Modularity is not exclusive to imperative programming. It is
also not the silver bullet that slays the complexity lycanthrope.

It works as well as anything else for biz apps.

Post by Patrick May
The vast majority of businesses that need systems of this
complexity have legacy software consisting of a number of COTS
applications and custom components, none of which were designed to
work with each other. These have been selected or developed for
good business reasons and cannot be aggregated and run on a single
piece of kit, no matter how large.

Agreed, but what does the existence of legacy apps have to do with
your scaling claim?

The existence of legacy systems is just one reason why your
suggestion of using '"big-iron" RDBMS such as Oracle or DB2' cannot
solve the complex problems of large organizations.

I am not following you here. If some of the processing has to be on
legacy systems then it has to be on legacy systems. The introduction of
a DB does not change that.

BTW, I've encountered such systems were legacy info was converted to
RDBMS data via periodic updates so that it could be used more easily.

Post by topmind
I did not mean to suggest that everything should be centralized.

That is what you suggested above.

I didn't say it must be centralized. Perhaps I should have said "big
iron(s)". That better?

Your lack of experience with large, complex systems is showing,
again.

You always claim this. Offer evidence that your alternative is better
instead of belittling people without evidence. That is rude and bad
debating. Even a chimp can claim he/she is smart and a know-it-all. The
hard part is demonstrating it.

Post by Patrick May
Basically you're suggesting one or more monolithic processing
hubs -- the simple CRUD/USER stuff you're used to writ large. Are
those the only kinds of problems you're used to?

What kind of apps do you have in mind?

Post by topmind
"Security" is mostly just massive ACL tables.

That is profoundly . . . naive. I strongly urge you to read
everything you can find by Bruce Schneir, join the cryptography
mailing list run by Perry Metzger, and not say another word about
security until you understand why your statement is so deeply
embarrassing to you. For a quick, very small taste of why ACL
tables don't even begin to scratch the surface of the problem,
read http://www.isi.edu/gost/brian/security/kerberos.html.

I see no mention of ACL's there.

That's my point.

So if Mr. Schneir does not mention it, it is no good?

I never claimed that ACL's crash and burn. I said that ACLs
barely scratch the surface of the security requirements of a large
distributed system. Clearly you don't have any experience with such.

Clearly you have no evidence to present.

Post by topmind
Again, use evidence instead of patronizing insults. It is a bad
habit of yours.

Try something novel, like, say..........counter evidence?

Post by Patrick May
CRUD applications are, however, not particularly complex as
software systems go. Your claims otherwise indicate a lack of
experience with anything else.

Again, please use evidence to prove me wrong instead of patronizing
insults. It is a bad habit of yours.

How is that patronizing? It's a simple statement of fact. There
is a reason why the CRUD work is typically given to new hires and
junior developers.

You have a strange sense of biz apps.

Easier domain to learn, yes. Less complex, no. I stand by that unless
you can come up with a decent measurement for complexity. Personal
opinions are a dime-a-dozen. Evidence backing such opinions is rare,
and certainly in the vacinity of you. The two candidate metrics
proposed: code bulk and automatability, show biz apps not be measurably
simpler.

Most FP demonstrations of such are "toy" or "lab" examples.

Dismissing out of hand systems of which you know nothing. That's
a bad habit of yours.

I gave them a chance to strut their stuff. If they can't strut right,
it ain't my fault.

Post by Patrick May
I'd be tempted to ascribe your apparent need to continue
discussions ad nauseam to some form of obsessive-compulsive disorder,
but I don't have a background in psychology so I won't. You should
try that not-talking-about-things-you-know-nothing-about approach
sometime.

Prove you are full of truth instead of claim it.

Hey, weren't you blown away by others on your claim that "relational"
was mostly about links? You never even admitted defeat. Stuburn to the
stub you are.

Post by Patrick May
Sincerely,
Patrick

-T-

Patrick May

2006-01-23 21:45:11 UTC

Post by topmind
SQL is close to being Turing Complete.

In other words, SQL is not Turing complete. That addresses

No, because I never claimed "all".

You asked "But how are tables less close to the domain than
classes, methods, and attributes?" The answer is, they lack
behavior. The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

Post by topmind
Plus, OO is usually crappy at modeling behavior, at least in the
biz domain. OO is only nice when things split up into nice
hierarchical taxonomies. Most things don't in reality, so what
is left is a mess.

You've been challenged on this assertion in the past and
failed to defend it. The history is available via Google for
anyone to see. Unless you've got more to back up your nonsense
than you did before, repeating this is intellectually dishonest.

My crappometer crapped out. The burden is not on me to prove that OO
is better. It is on you. The default is equal or unknown.

Not only are you making the same unfounded claim as you have
repeatedly in the past, you are attempting to squirm in exactly the
same way. You made the claim, you have the burden of proof. Put up
or shut up (we should be so lucky).

Post by topmind
Almost all of these have a fair amount of disagreement among OO
proponents. Check out c2.com.

Interesting. I provide explicit examples of what are
generally accepted as good OO principles and practices and you
refer to a random website. If you have real documentation of
getter/setter proliferation being an accepted OO technique,
produce it.

Do you have the opposite?

Provided right above. Here it is again, for your convenience:

Josh Bloch recommends immutability explicity in "Effective Java"
and gives solid reasons for his position. Proliferation of
getters and setters violates encapsulation, one of the defining
characteristics of object technology. Some research will show
you that OO designs focus on behavior, not state. You should
also check out the Law of Demeter and similar guidelines that
provide further evidence that excessive use of accessors and
mutators is not good OO form.

Now, where is your evidence that proliferation of accessors and
mutators is considered good OO practice?

Post by Patrick May
The existence of legacy systems is just one reason why your
suggestion of using '"big-iron" RDBMS such as Oracle or DB2'
cannot solve the complex problems of large organizations.

I am not following you here. If some of the processing has to be on
legacy systems then it has to be on legacy systems. The introduction
of a DB does not change that.

Exactly. You stateed that:

I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can be
obtained by purchasing a "big-iron" RDBMS such as Oracle or DB2.

This is simply not the case in real world systems. The existence of
legacy systems is just one reason why your "just use a big database"
approach won't meet NFRs such as performance, scalability, resiliency,
and recoverability.

Post by topmind
It depends on the kind of business. If the vast majority of
operations/tasks needed are per location, then regional
partitioning works well. If not, then a more centralized
approach is needed. For example, airline reservation and
scheduling systems would make a poor candidate to partition by
location because of the interconnectedness of flights. However,
individual stores in a big franchise can operate most
independently.

Your lack of experience with large, complex systems is
showing, again.

You always claim this. Offer evidence that your alternative is
better instead of belittling people without evidence. That is rude
and bad debating. Even a chimp can claim he/she is smart and a
know-it-all. The hard part is demonstrating it.

As you've shown.

The point here is that statements like the one above indicate
more about the limited types of systems to which you apparently have
some minimal exposure than they do about good software development
practice. Experience in this newsgroup has shown that attempting to
educate you about software outside of your tiny box is a waste of
time. Hence, I simply point out when you are saying more than you
think you are, and move on.

Post by topmind
"Security" is mostly just massive ACL tables.

That is profoundly . . . naive. I strongly urge you to
read everything you can find by Bruce Schneir, join the
cryptography mailing list run by Perry Metzger, and not say
another word about security until you understand why your
statement is so deeply embarrassing to you. For a quick, very
small taste of why ACL tables don't even begin to scratch the
surface of the problem, read
http://www.isi.edu/gost/brian/security/kerberos.html.

I see no mention of ACL's there.

That's my point.

So if Mr. Schneir does not mention it, it is no good?

I'll type more slowly. There is no mention of ACLs in the
references I provided. The references I provided show how some
important security issues are addressed. This demonstrates that your
claim that '"Security" is mostly just massive ACL tables.' is
nonsense.

Were any of those words too big?

Post by Patrick May
CRUD applications are, however, not particularly complex
as software systems go. Your claims otherwise indicate a lack
of experience with anything else.

Again, please use evidence to prove me wrong instead of
patronizing insults. It is a bad habit of yours.

How is that patronizing? It's a simple statement of fact.
There is a reason why the CRUD work is typically given to new
hires and junior developers.

You have a strange sense of biz apps.

You have a nasty tendency of equivocating between "CRUD
applications" and "biz apps".

Post by Patrick May
If you're seriously suggesting that CRUD applications are
equal in complexity to compilers, telco OSS/BSS, MRP/ERP, or risk
analytics, just to pull a few examples off the top of my head,
then that reflects more on your experience than on the veracity of
your claim.

Easier domain to learn, yes. Less complex, no. I stand by that
unless you can come up with a decent measurement for
complexity. Personal opinions are a dime-a-dozen. Evidence backing
such opinions is rare, and certainly in the vacinity of you. The two
candidate metrics proposed: code bulk and automatability, show biz
apps not be measurably simpler.

Considered technical views of experienced software developers are
valuable, especially for other developers with the experience to
assimilate those views. You demonstrate your inexperience by
repeatedly taking positions like the one above, which boil down to "I
am unable or unwilling to understand your position, therefore mine is
equally valid." Does that make you feel all warm and fuzzy inside?

Post by Patrick May
On the other hand, there are some delightfully complex
software systems that consist of only a few hundred lines of
code. Functional languages seem especially good for this.
See one of Peter Norvig's books for a few examples.

Most FP demonstrations of such are "toy" or "lab" examples.

Dismissing out of hand systems of which you know nothing. That's
a bad habit of yours.

I gave them a chance to strut their stuff. If they can't strut
right, it ain't my fault.

Yeah, I can see how systems like Macsyma, RAX (controlling Deep
Space 1), Orbitz, the scheduling system for Gulf War 1, countless
expert systems, electronics layout, and CAD systems, just off the top
of my head, aren't too impressive. Kent Pitman put it best:

"Please don't assume Lisp is only useful for Animation and
Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Mining,
EDA/Semiconductor applications, Expert Systems, Finance,
Intelligent Agents, Knowledge Management, Mechanical CAD,
Modeling and Simulation, Natural Language, Optimization,
Research, Risk Analysis, Scheduling, Telecom, and Web Authoring
just because these are the only things they happened to list."

All toys, of course.

Post by topmind
Hey, weren't you blown away by others on your claim that
"relational" was mostly about links? You never even admitted
defeat. Stuburn to the stub you are.

No, that wasn't me. I suspect that either a) you know that or b)
you have so little integrity that you are willing to make false claims
to score rhetorical points in your own mind. Try showing some
intellectual honesty and either provide some evidence or retract your
false claim.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Mikito Harakiri

2006-01-24 01:28:12 UTC

Post by Patrick May
The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

There are two incorrect assertions here:
1. What power do you have in mind, computational power? Then you made
it sound like it is OO that added more power, while in fact procedural
programming without object extensions is as powerful as procedural
programming with them.
2. Is SQL really less powerful? What computational feature is exactly
missing?

Dmitry A. Kazakov

2006-01-24 09:13:02 UTC

Post by Patrick May
The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

1. What power do you have in mind, computational power? Then you made
it sound like it is OO that added more power, while in fact procedural
programming without object extensions is as powerful as procedural
programming with them.

This is true. It cannot be answered without software metrics or an
equivalent. Under power, abstraction power is meant. Which is quite
difficult to measure. In my view a measure could be the type system, i.e.
the ladder value -> type -> types set -> types sets set ... and
completeness of each footstep*. Others use nGL hierarchy.

Post by Mikito Harakiri
2. Is SQL really less powerful? What computational feature is exactly
missing?

One could point to Turing completeness, but clearly, it isn't a real,
immediate loss. Completeness is rather a precondition. It does not imply
anything. If my application area does not require something a Turing
machine can, then I don't care.

----------
* This is why I count SQL as extremely low-level, comparable to Algol-60.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Mikito Harakiri

2006-01-24 21:11:52 UTC

Post by Patrick May
The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

Well, what about extensible DBMS engines, where you can add new type
definitions?

Post by Mikito Harakiri
2. Is SQL really less powerful? What computational feature is exactly
missing?

One could point to Turing completeness, but clearly, it isn't a real,
immediate loss.

Is SQL Turing incomplete?

Dmitry A. Kazakov

2006-01-25 13:02:43 UTC

Post by Patrick May
The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

Well, what about extensible DBMS engines, where you can add new type
definitions?

Then I would say that the corresponding language is more powerful than SQL.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Patrick May

2006-01-24 21:08:16 UTC

Post by Patrick May
The most common language for manipulating tables is SQL and
it is not as powerful as general purpose OO languages.

1. What power do you have in mind, computational power? Then you
made it sound like it is OO that added more power, while in fact
procedural programming without object extensions is as powerful as
procedural programming with them.

In this context I used "power" in terms of expressiveness and
computational capabilities. You are correct in that I did not need to
limit the comparison to OO languages.

Post by Mikito Harakiri
2. Is SQL really less powerful? What computational feature is
exactly missing?

It is possible to write a database management system with an SQL
interface using any general purpose programming language. It is not
possible to implement a general purpose programming language in SQL.
That's a clear indication that SQL is less capable, and hence less
powerful, than a general purpose language.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Mikito Harakiri

2006-01-24 22:26:08 UTC

Post by H. S. Lahman
It is not
possible to implement a general purpose programming language in SQL.
That's a clear indication that SQL is less capable, and hence less
powerful, than a general purpose language.

I wouldn't be so sure. All what is required is to express one of the
standard computational models in SQL.

Consider primitive recursive functions. Clearly constants, increments,
projection are expressible in SQL. Composition is done via join with
join conditions matching functions inputs and outputs. Primitive
recursion is done via linear recursion via subquery factoring clause.
Granted yet we need to step up from primitive recursive functions to
recursive functions, the assertion that there is a gap in SQL
expressiveness is not obvious at all. (And with those goofy spreadsheet
extensions -- aka model clause -- I doubt there is any gap at all).

Alfredo Novoa

2006-01-25 12:38:35 UTC

Post by Patrick May
It is possible to write a database management system with an SQL
interface using any general purpose programming language. It is not
possible to implement a general purpose programming language in SQL.

Why not?

It is perfectly possible. SQL has a procedural part.

Any program coded in a general purpose programming language might be
replicated in SQL with a similar code size, but if you want to get the
same result of a single SQL statement using a general purpose language
you might need many thousands of code lines.

Regards

topmind

2006-01-24 03:06:52 UTC

Post by topmind
SQL is close to being Turing Complete.

In other words, SQL is not Turing complete. That addresses

No, because I never claimed "all".

How does not covering the entire behavior spectrum make it "less close"
to the domain? It only means that its role is not to do the whole
enchalada. It is not *meant* to do the entire app from end to end. I
never claimed that about query languages or DB's. Being an end-to-end
tool is not a prerequisite for being useful or being "general purpose".

Post by topmind
Almost all of these have a fair amount of disagreement among OO
proponents. Check out c2.com.

Interesting. I provide explicit examples of what are
generally accepted as good OO principles and practices and you
refer to a random website. If you have real documentation of
getter/setter proliferation being an accepted OO technique,
produce it.

Do you have the opposite?

Josh Bloch recommends immutability explicity in "Effective Java"
and gives solid reasons for his position. Proliferation of
getters and setters violates encapsulation, one of the defining
characteristics of object technology. Some research will show
you that OO designs focus on behavior, not state. You should
also check out the Law of Demeter and similar guidelines that
provide further evidence that excessive use of accessors and
mutators is not good OO form.
Now, where is your evidence that proliferation of accessors and
mutators is considered good OO practice?

One opinion means diddly squat. I believe the goal was "consensus".
Those who put accessors around everything did it due reading somebody's
mantra, not out of their own will. I didn't create the flame-wars
between the wrappers and non-wrappers, I only noted their existence.

I am not following you here. If some of the processing has to be on
legacy systems then it has to be on legacy systems. The introduction
of a DB does not change that.

I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can be
obtained by purchasing a "big-iron" RDBMS such as Oracle or DB2.
This is simply not the case in real world systems. The existence of
legacy systems is just one reason why your "just use a big database"
approach won't meet NFRs such as performance, scalability, resiliency,
and recoverability.

Are you saying your techniques better fit with legacy systems? I would
like to see a demo. I know many shops that copy data from legacy
systems into RDBMS so that they can more easily query, sift, and search
the data. One feature of legacy systems is that a good many cannot
handle a lot of data due to their age. Thus, current technology can
copy what is there into an RDBMS.

Post by Patrick May
The point here is that statements like the one above indicate
more about the limited types of systems to which you apparently have
some minimal exposure than they do about good software development
practice. Experience in this newsgroup has shown that attempting to
educate you about software outside of your tiny box is a waste of
time. Hence, I simply point out when you are saying more than you
think you are, and move on.

Perhaps because they use argument-from-authority instead of evidence
out of bad habit. You show me clear-cut coded proof, and I will change
my mind. Until then, the brochure-talk and black-box-bragging can take
a hike. OOer's seem so surprised and shocked when people ask for
evidence, as if they are above it. "Science is so 70's. Alan Kay made
science obsolete." (demonstrative quote only).

Post by Patrick May
I'll type more slowly. There is no mention of ACLs in the
references I provided. The references I provided show how some
important security issues are addressed. This demonstrates that your
claim that '"Security" is mostly just massive ACL tables.' is
nonsense.

Okay, I see where this went wrong. I meant security *could* be
implemented by massive ACL tables. I did not mean to imply it was the
current norm. I apologize for my wording there.

Post by Patrick May
CRUD applications are, however, not particularly complex
as software systems go. Your claims otherwise indicate a lack
of experience with anything else.

Again, please use evidence to prove me wrong instead of
patronizing insults. It is a bad habit of yours.

How is that patronizing? It's a simple statement of fact.
There is a reason why the CRUD work is typically given to new
hires and junior developers.

You have a strange sense of biz apps.

You have a nasty tendency of equivocating between "CRUD
applications" and "biz apps".

In practice the line is blurred. It used to be more separated into
"batch jobs" and "front-end" work, but now people expect instant
answers, so the line has been fading. There is sometimes a split
between "architects" and "coders", but this is largely because
architects know the domain better. But, architects are still often
involved with front-end/output design.

Post by Patrick May
Considered technical views of experienced software developers are
valuable, especially for other developers with the experience to
assimilate those views. You demonstrate your inexperience by
repeatedly taking positions like the one above, which boil down to "I
am unable or unwilling to understand your position, therefore mine is
equally valid." Does that make you feel all warm and fuzzy inside?

One should not be expected to accept anecdotal evidence on face value
here. Opinions from experts and amatures point every which way. If
anecdotal evidence is all you have, then I am done here. Few come here
to read unjustified positions. There are plenty of other places to get
summary opinions.

Post by Patrick May
On the other hand, there are some delightfully complex
software systems that consist of only a few hundred lines of
code. Functional languages seem especially good for this.
See one of Peter Norvig's books for a few examples.

Most FP demonstrations of such are "toy" or "lab" examples.

Dismissing out of hand systems of which you know nothing. That's
a bad habit of yours.

I gave them a chance to strut their stuff. If they can't strut
right, it ain't my fault.

Yeah, I can see how systems like Macsyma, RAX (controlling Deep
Space 1), Orbitz, the scheduling system for Gulf War 1, countless
expert systems, [.....]

The challenge was *not* mere implementation. Even assembler has a lot
of implementations. They claimed "significantly less code". I gave them
a sample application and they made up every (lame) excuse under the
sun. But that is another matter. FP is not the topic.

Post by topmind
Hey, weren't you blown away by others on your claim that
"relational" was mostly about links? [....]

No, that wasn't me. [....]

I apologize if it was not you. I should have checked.

Post by Patrick May
Sincerely,
Patrick

-T-

Patrick May

2006-01-24 21:54:23 UTC

Post by Patrick May
You asked "But how are tables less close to the domain than
classes, methods, and attributes?" The answer is, they lack
behavior. The most common language for manipulating tables is SQL
and it is not as powerful as general purpose OO languages.

How does not covering the entire behavior spectrum make it "less
close" to the domain?

As Mr. Lahman has eloquently pointed out, only CRUD/USER
applications map directly to the relational model. Other applications
require different models of both data and behavior. Since SQL has a
limited support for modeling behavior relative to general purpose
languages, by your own admission, it is less capable of reflecting the
abstractions of problem domains other than those of CRUD data
pipelines.

Post by Patrick May
Josh Bloch recommends immutability explicity in "Effective
Java" and gives solid reasons for his position.
Proliferation of getters and setters violates encapsulation,
one of the defining characteristics of object technology.
Some research will show you that OO designs focus on
behavior, not state. You should also check out the Law of
Demeter and similar guidelines that provide further evidence
that excessive use of accessors and mutators is not good OO
form.
Now, where is your evidence that proliferation of accessors and
mutators is considered good OO practice?

One opinion means diddly squat.

If you count carefully, without even the need to remove your
shoes, you will note more than one generally accepted guideline
mentioned above. Further, dismissing well documented and rationally
defended positions as "opinion" is a cheap rhetorical trick.

Where is your evidence that proliferation of accessors and
mutators is generally accepted as good OO practice?

Post by topmind
Those who put accessors around everything did it due reading
somebody's mantra, not out of their own will.

So you've seen code that claims to be OO while violating the
principle of encapsulation. That's more exposure to OO than I
suspected you've had, but it doesn't prove your point.

While we're on the topic of you producing evidence for your

Post by topmind
OO is usually crappy at modeling behavior, at least in the biz
domain. OO is only nice when things split up into nice hierarchical
taxonomies.
method getGreenScarvesCostingLessThan100dollars(...) {
sql = "select * from products where prod='scarves' and color='green'
and price < 100"
return(foo.execute(sql))
}

Are you going to back these up or should we consider them retracted?

Post by Patrick May
I would note that a lot of the issues you mentioned, such as
performance, scalability, resiliency, and recoverability can
be obtained by purchasing a "big-iron" RDBMS such as Oracle
or DB2.
This is simply not the case in real world systems. The existence
of legacy systems is just one reason why your "just use a big
database" approach won't meet NFRs such as performance,
scalability, resiliency, and recoverability.

Are you saying your techniques better fit with legacy systems?

I am saying that your claim is nonsense. Given the need to
integrate legacy systems and third party products, to expose business
functionality to a variety of other business systems, to orchestrate
various levels of processes and workflows, and meet all the other
requirements of enterprise systems, the idea that the NFRs can be met
'by purchasing a "big-iron" RDBMS' is ridiculous.

Post by Patrick May
The point here is that statements like the one above indicate
more about the limited types of systems to which you apparently
have some minimal exposure than they do about good software
development practice. Experience in this newsgroup has shown that
attempting to educate you about software outside of your tiny box
is a waste of time. Hence, I simply point out when you are saying
more than you think you are, and move on.

Perhaps because they use argument-from-authority instead of evidence
out of bad habit.

When you've supported some of your own claims, demonstrated a
willingness and ability to learn from people with more experience,
shown some intellectual honesty by admitting when you don't know
about a topic or are proven wrong, and have generally refrained from
your typical trollish behavior for some convincing amount of time, it
may be worth devoting the extensive effort required to educate you.
Given your historical behavior, at the moment it would be a waste of
time.

Post by topmind
You show me clear-cut coded proof, and I will change my mind.

No, you won't. You will ignore evidence, deliberately
misunderstand clear arguments, drag threads out interminably with off
topic comments, continually introduce absurd claims that you fail to
support, and basically waste the time of everyone who grabs the tar
baby of your discourse.

Pointing out your flawed assertions when you make them is a more
rational use of time than is trying to have a rational discussion with
you.

Post by Patrick May
I'll type more slowly. There is no mention of ACLs in the
references I provided. The references I provided show how some
important security issues are addressed. This demonstrates that
your claim that '"Security" is mostly just massive ACL tables.' is
nonsense.

Okay, I see where this went wrong. I meant security *could* be
implemented by massive ACL tables.

No, it cannot. Either you really don't understand this, in which
case you should not be commenting on the topic, or Mr. Lahman is
correct and you are deliberately attempting to annoy people who do
know something about the issue.

ACLs stored in database tables don't even begin to address the
security needs of large systems. Read the references provided.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Patrick May

2006-01-24 22:00:46 UTC

Post by Patrick May
On the other hand, there are some delightfully
complex software systems that consist of only a few
hundred lines of code. Functional languages seem
especially good for this. See one of Peter Norvig's books
for a few examples.

Most FP demonstrations of such are "toy" or "lab" examples.

Dismissing out of hand systems of which you know nothing.
That's a bad habit of yours.

I gave them a chance to strut their stuff. If they can't strut
right, it ain't my fault.

Yeah, I can see how systems like Macsyma, RAX (controlling
Deep Space 1), Orbitz, the scheduling system for Gulf War 1,
countless expert systems, [.....]

The challenge was *not* mere implementation.

The question from you was whether I knew of any complex systems
that didn't involve large amounts of code relative to less complex
systems. I provided such examples.

Post by topmind
Even assembler has a lot of implementations. They claimed
"significantly less code". I gave them a sample application and they
made up every (lame) excuse under the sun. But that is another
matter. FP is not the topic.

I ask this question for the same reason and much the same feeling
as when I take a good look at a serious accident encountered on the
highway, but could you please point me to the location of this
discussion?

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

frebe

2006-01-22 13:09:06 UTC

Post by Patrick May
There
is a reason why the CRUD work is typically given to new hires and
junior developers.

The reason why CRUD work is given to junior developers is the fact that
using OO design, CRUD applications are very bloated. If RAD tools were
used instead, the same work would be done in a few minutes, saving a
lot of money instead of hiring an army of (junior) developers.

Fredrik Bertilsson
http://butler.sourceforge.net

Alfredo Novoa

2006-01-25 12:49:22 UTC

Post by frebe
The reason why CRUD work is given to junior developers is the fact that
using OO design, CRUD applications are very bloated. If RAD tools were
used instead, the same work would be done in a few minutes, saving a
lot of money instead of hiring an army of (junior) developers.

But consultants don't want to save the customer's money, they want to
make money, and the more customer's money they waste, the more money
they get.

Regards
Alfredo

H. S. Lahman

2006-01-15 15:51:44 UTC

Responding to Jacobs...

Post by H. S. Lahman
SQL /is/ a 3GL.

Depends on how you define GL's. What do you consider an example of a
4GL?

What you are talking about is not a limit, but a "problem" of choice.
If the *only* RDBMS on the planet was say PostgreSql, then such a
complaint would not exist anymore than there being only one
implementation of say a Python interpreter. Nobody ever claimed that
Python was a lessor generation language because there was only one
implementation. One can run Postgre on Unix, Macs (IIRC), PC's, etc. I
would note.
Why demote the rank of something simply because there are choices?

The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets. The 4GL solution is completely independent of those
choices. SQL is completely dependent on an RDB implementation of
persistence. IOW, SQL has already made the implementation choice.

Are you talking about SQL specificly, or relational in general?
"Doesn't support abstraction" is a rather sweeping claim.

SQL specifically. IMO it isn't a very good language even for
relational.

I agree that it needs improvement/overhaul. However, it is still a
powerful tool even without fixes.

Post by H. S. Lahman
It survives because of the huge volumes of legacy code
around. Note that when you do P/R you don't program solely in SQL; you
just use SQL to talk to the RDB.

That is Divide and Conquer, letting code do what it does best and the
DB doing what it does best.

It reflects the fact that SQL is not well suited to describing solution
dynamics and general programming, which was my original assertion.

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

I am talking about the abstracting the domain where the original problem
exists rather than the computing domain where a software solution will
be executed. SQL only abstracts a very narrow part of the computing domain.

Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

I never said business applications are simpler; I spent nearly a decade
doing heavy duty business applications. I am also on record in this
forum for saying several times that modern IT is beginning to look a lot
like R-T/E.

What I implied was that CRUD/USER applications tend to be not very
complex. Report generation was never very taxing even back in the COBOL
days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
browser UI for a printed report doesn't change the fundamental nature of
the processing.

Nonetheless USER/CRUD processing remains a substantial segment of IT
processing, which is why RDBS and SQL are popular and RAD IDE automation
was developed. (Note that RAD IDEs represent an early form of
translation; it was just limited to the CRUD/USER niche.)

Post by H. S. Lahman
Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

UML takes us back to the navigational/CODYSAL pointer/path hell that
proved a mess in the 60's and 70's. UML and OO is the structural GO TO
of the modern age.

You keep trying but I'm still not biting. This is just another of your
stock ploys to pull people's chains. You throw out a deliberately
baseless and nonsensical assertion just to create emotional controversy
to drag people down your anti-OO rabbit hole.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

Patrick May

2006-01-15 18:13:40 UTC

Unfortunately, I agree with May . . . .

"Unfortunately"? I thought only my wife found something
inherently objectionable about agreeing with me.

Regards,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

H. S. Lahman

2006-01-16 15:52:33 UTC

Responding to May...

Unfortunately, I agree with May . . . .

"Unfortunately"? I thought only my wife found something
inherently objectionable about agreeing with me.

My intent was that the 'unfortunate' referred to feeling that what
Jacobs said was gibberish. I was just trying to take the high road by
taking the edge off.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

topmind

2006-01-15 21:03:18 UTC

Post by H. S. Lahman
Responding to Jacobs...

Post by H. S. Lahman
SQL /is/ a 3GL.

Depends on how you define GL's. What do you consider an example of a
4GL?

What you are talking about is not a limit, but a "problem" of choice.
If the *only* RDBMS on the planet was say PostgreSql, then such a
complaint would not exist anymore than there being only one
implementation of say a Python interpreter. Nobody ever claimed that
Python was a lessor generation language because there was only one
implementation. One can run Postgre on Unix, Macs (IIRC), PC's, etc. I
would note.
Why demote the rank of something simply because there are choices?

The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

SQL is not an implementation. What is the difference between locking
yourself to SQL instead of locking yourself to Java? If you want
open-source, then go with PostgreSQL. What is the diff? Java ain't no
universal language either.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets.

For the zillionth time, RDBMS are far more than just "persistence".
Some newer RDBMS don't even have to touch the disk. DB's provide these
services:

# Persistence
# High-level (English or math-like) query languages or query ability
# metadata repository
# State management
# Multi-user contention management and concurrency (locks,
transactions, rollbacks, etc.)
# Backup and replication of data
# Access security
# Data computation/processing (such as aggregation and
cross-referencing)
# Data rule enforcement or validation
# Data export and import utilities
# Multi-language and multi-application data sharing
# Data change and access logging
# Automated "result path" optimization (user focuses on what, not on
how)

You just don't get these with flat files unless you program them in
yourself, which is called "reinventing the DB wheel". Currency, joins,
and aggregation alone require a lot of coding. "Persistence" alone does
not cover that. Concurrency, joins, and aggregation are perhaps even
orthogonal to "persistence".

Let's see clay tablets do a join. Have an elephant sit on them? Be my
guest, just tell me when you find all the data and let's compare to the
results to even a CPM-era desktop DB.

Post by H. S. Lahman
The 4GL solution is completely independent of those
choices. SQL is completely dependent on an RDB implementation of
persistence. IOW, SQL has already made the implementation choice.

So being married to Java is better than being married to SQL? Java does
not even have many of the features mentioned above.

Are you talking about SQL specificly, or relational in general?
"Doesn't support abstraction" is a rather sweeping claim.

SQL specifically. IMO it isn't a very good language even for
relational.

I agree that it needs improvement/overhaul. However, it is still a
powerful tool even without fixes.

Post by H. S. Lahman
It survives because of the huge volumes of legacy code
around. Note that when you do P/R you don't program solely in SQL; you
just use SQL to talk to the RDB.

That is Divide and Conquer, letting code do what it does best and the
DB doing what it does best.

It reflects the fact that SQL is not well suited to describing solution
dynamics and general programming, which was my original assertion.

Well, I will agree that relational languages may not be suitable to
*every* domain, but that does not mean they are not general purpose.
There is no "general purpose" tool/language that is ideal for
everything. Thus, "everything" is an unreachable goal for any
tool/lang. Many things are "semi-general-purpose" and DB's are one of
them.

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

I disagree. A large part of *most* apps I have seen involves
database-oriented stuff. P. May mentioned security. Security can be
viewed as a dealing with large ACL tables. Most algorithms can be
reduced to mostly DB-oriented operations. I had to build a 3D graphics
system in college, and most of it could be reduced to DB-operations:
having "parts" reference each other in many-to-many tables,
transformation steps tracking, looking up polygons, cross-referencing
those polygons with their "parent part", storing scan-lines for later
inspection, etc. I will agree that DB's are not (currently) fast at
such, but still from a logical perspective the operations were
essentially DB-oriented. (Because I couldn't use a DB, I ended up
reinventing a lot of DB idioms and it was not very fun.)

A compiler/interpreter is the same way: a DB would make it simpler to
implement, especially fancy interactive debuggers, but performance/RAM
issues make it not practical. Thus, one spends a lot of time coding
array and pointer hopping to look stuff up, count things, etc. Most of
it is just tracking and associating "stuff".

Post by H. S. Lahman
Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

If May said anything else, we should check his tempurature. He has
never been friendly to me or my ideas. A polite person would simply ask
for clarification rather than label it "jargon or mantras". You
wouldn't do that to your boss, would you?

Post by topmind
I am trying to see if there is likely to be yet another
OO-vs-Relational battle breaking out here.

Real-time and embedded? I think they are going away from it because of
auto garbage collection, DB's, etc. -- stuff that RT/E avoids.

Post by H. S. Lahman
What I implied was that CRUD/USER applications tend to be not very
complex. Report generation was never very taxing even back in the COBOL
days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
browser UI for a printed report doesn't change the fundamental nature of
the processing.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Post by H. S. Lahman
Nonetheless USER/CRUD processing remains a substantial segment of IT
processing, which is why RDBS and SQL are popular and RAD IDE automation
was developed. (Note that RAD IDEs represent an early form of
translation; it was just limited to the CRUD/USER niche.)

Unfortunately good RAD/IDE tools have yet to arrive for web-based apps,
and we're stuck futzing with the limits of low-level HTTP issues.

Post by H. S. Lahman
Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

UML takes us back to the navigational/CODYSAL pointer/path hell that
proved a mess in the 60's and 70's. UML and OO is the structural GO TO
of the modern age.

You are so cute when you paint me as bad, manipulative, and evil.

Post by H. S. Lahman
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman

-T-

H. S. Lahman

2006-01-16 17:55:59 UTC

Responding to Jacobs...

Post by H. S. Lahman
The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

Of course it's an implementation! It implements access to physical
storage. More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Requirements -> 4GL -> 3GL -> Assembly -> machine code executable

Everything on the left is a specification for what is immediately to its
right. Similarly, everything to the right is a solution implementation
for the specification on its immediate left.

Go look at an SA/D Data Flow Diagram or a UML Activity Diagram. They
express data store access at a high level of abstraction that is
independent of the actual storage mechanism. SQL, ISAM, CODASYL, gets,
or any other access mechanism, is then an implementation of that generic
access specification.

Java is certainly a general purpose 3GL. Like most 3GLs there are
situations where there are better choices (e.g., lack of BCD arithmetic
support makes it a poor choice for a General Ledger), but one could
still use it in those situations. SQL, in contrast, is a niche language
that just doesn't work for many situations outside its niche.

BTW, remember that I am a translationist. When I do a UML model, I
don't care what language the transformation engine targets in the model
implementation. (In fact, transformation engines for R-T/E typically
target straight C from the OOA models for performance reasons.) Thus
every 3GL (or Assembly) represents a viable alternative implementation
of the notion of '3GL'.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets.

For the zillionth time, RDBMS are far more than just "persistence".

It is only if one refuses to manage complexity by separating logical
concerns. Render unto the Disk generic static storage and render unto
the Application context-dependent dynamics.

* 1
[Context] ----------------- [Data]

1 *
[Problem Solution] -------- [Data]

The first view if the basis of the RDB paradigm -- generic storage of
the same data for access by many different contexts. The second view is
the one that is relevant for solving large problems -- access of data
that is carefully tailored to the problem in hand. Storing and
accessing data for many different contexts is a quite different problem
than formatting and manipulating data to solve a specific problem.

For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

When the only tool you have is a Hammer, then everything looks like a
Nail. The FP people will be happy to tell you that they can solve those
problems quite elegantly without any state variables, much less tables
of data. Similarly, the OO paradigm allows one to solve those problems
with quite different abstractions.

[FYI, the most comprehensive book on building UML Class Diagrams is Leon
Starr's "Executable UML: How to Build Class Models". In that book Leon
repeatedly uses tables as analogues for validating class instances. In
effect he is using them as a tool for normalization. But the
identification of classes and properties is all done in a OO fashion via
abstraction for the problem space. More important, the management of
relationships and collaborations is very different than the RDB paradigm
(e.g., one almost never selects from a collection of all instances and
the notion of a multi-table join is virtually nonexistent in an OO
application). IOW, there is a mapping to tables but the construction
paradigm is not at all driven by a table view.]

Post by topmind
A compiler/interpreter is the same way: a DB would make it simpler to
implement, especially fancy interactive debuggers, but performance/RAM
issues make it not practical. Thus, one spends a lot of time coding
array and pointer hopping to look stuff up, count things, etc. Most of
it is just tracking and associating "stuff".

I don't think that is a very good analogy. A compiler/interpreter
applies exactly the same set of rules to transforming the input model to
machine language on the platform de jour. IOW, the same
compiler/interpreter will do exactly the same thing in any shop having
the same hardware for any customers.

The issue here is letting the DBMS apply business rules and policies
that are specific to a particular customer or, worse, that are specific
to a particular problem. Then one is not separating concerns and one is
bleeding cohesion; the penalty is a high cost in maintenance.

Post by H. S. Lahman
<moved>What I implied was that CRUD/USER applications tend to be not very
complex. Report generation was never very taxing even back in the COBOL
days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
browser UI for a printed report doesn't change the fundamental nature of
the processing.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Back in the '60s and early '70s writing COBOL code to extract data and
format reports was a task given to the entry level programmers. That's
where the USER acronym (Update, Sort, Extract, Report) came from. The
stars went on to coding Payroll and Inventory Control where one had to
encode business rules and policies to solve specific problems.

In CRUD/USER processing the customer is the one actually solving
problems by interpreting the data. The developer is just formatting the
data to support the customer. Replace printouts with GUIs or browsers
and the customer is still solving the real problem. It is only when the
problem solution gets drawn into the software that one leaves the realm
of CRUD/USER processing and thing start to get tricky.

Post by H. S. Lahman
Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

We've been here before. It's part of your shtick. You define your own
terms, like 'noun-based' and 'verb-based', and then throw them out on
the table so that your opponent sees them as non sequitors. You also
keep throwing out the same slogans as your web site. They are part of
the predictable collection of forensic ploys you use when debating OO
people. It's all designed to have an emotional effect to put the
opponent on tilt.

You seem to get your amusement out of having OO people go bonkers.
Seeing how far down the rabbit hole you can drag them is a sport to you.
You even keep trying to get me involved in such debates, even though I
insist on not biting. Do you think that I haven't noticed that this
subthread started off as a discussion about DBMS vs. application
responsibilities but you keep trying to push it into an OO debate? The
entire paragraph in question was a blatant attempt to deflect from the
DBMS issues to OO issues.

BTW, I would and have said similar things to various bosses.

Post by H. S. Lahman
Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

UML takes us back to the navigational/CODYSAL pointer/path hell that
proved a mess in the 60's and 70's. UML and OO is the structural GO TO
of the modern age.

You are so cute when you paint me as bad, manipulative, and evil.

Not bad or evil, but definitely manipulative. You just find it amusing
to pull people's chains and the OO community is providing plenty of soft
targets. As I've said before, I think you are actually know a lot more
about OO development than you let on and you are pretty clever about the
way you tweak the OO people who engage with you.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

topmind

2006-01-16 21:29:37 UTC

Post by H. S. Lahman
Responding to Jacobs...

Post by H. S. Lahman
The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

Of course it's an implementation! It implements access to physical
storage.

Just as Java implements access to physical RAM etc.

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

Post by H. S. Lahman
Requirements -> 4GL -> 3GL -> Assembly -> machine code executable
Everything on the left is a specification for what is immediately to its
right. Similarly, everything to the right is a solution implementation
for the specification on its immediate left.

Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.

Post by H. S. Lahman
Go look at an SA/D Data Flow Diagram or a UML Activity Diagram. They
express data store access at a high level of abstraction that is
independent of the actual storage mechanism. SQL, ISAM, CODASYL, gets,
or any other access mechanism, is then an implementation of that generic
access specification.

SQL is independent of the "actual storage mechanism". It is an
interface. You may not like the interface, but that is another matter.
Repeat after me: "SQL is an interface, SQL is an interface, SQL is an
interface"....

Plus, what such UML models often do is something like:

method getGreenScarvesCostingLessThan100dollars(...) {
sql = "select * from products where prod='scarves' and color='green'
and price < 100"
return(foo.execute(sql))
}

You simply wrap everything with long method names and call it
"abstraction" and pat yourselves on the back. However, it is not really
abstraction because often they are called from only ONE place. That
just contributes to code bloat and red-tape-code.

Plus one could do the same with functions if you really want such red
tape. It is simply giving a specific unit of behavior a name. Some
people want to name every nut and bolt.

Post by H. S. Lahman
Java is certainly a general purpose 3GL. Like most 3GLs there are
situations where there are better choices (e.g., lack of BCD arithmetic
support makes it a poor choice for a General Ledger), but one could
still use it in those situations. SQL, in contrast, is a niche language
that just doesn't work for many situations outside its niche.

You could be right, but I have yet to see a good case outside of
split-second timing issues where there is a limit to the max allowed
response time. (This does not mean that rdbms are "slow", just less
predictable WRT response time.)

If you can give an example outside of timing, please do. (I don't doubt
they exist, but I bet they are rarer than you imply. Some scientic
applications that use imaginary numbers and lots of calculus may also
fall outside.)

Post by H. S. Lahman
BTW, remember that I am a translationist. When I do a UML model, I
don't care what language the transformation engine targets in the model
implementation. (In fact, transformation engines for R-T/E typically
target straight C from the OOA models for performance reasons.) Thus
every 3GL (or Assembly) represents a viable alternative implementation
of the notion of '3GL'.

Well, UML *is* language. It is a visual language just like LabView is.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets.

For the zillionth time, RDBMS are far more than just "persistence".

It is only if one refuses to manage complexity by separating logical
concerns.

"Separation" is generally irrelavent in cyber-land. It is a phsycial
concept, not a logical one. Perhaps you mean "isolatable", which can be
made to be dynamic, based on needs. "Isolatable" means that there is
enough info to produce a seperated *view* if and when needed. This is
the nice thing about DB's: you don't have to have One-and-only-one
separation/taxonomy up front. OO tends to want one-taxonomy-fits-all
and tries to find the One True Taxonomy, which is the fast train the
Messland. Use the virtual power of computers to compute as-need
groupings based on metadata.

This is also why UML sucks: Either you have a jillion diagrams of the
same thing in order to provide all the potential viewpoints that
different users and developers will need, or you have to force a
limited taxonomy on developers. UML does not provide this
virtualization capability.

Post by H. S. Lahman
Render unto the Disk generic static storage and render unto
the Application context-dependent dynamics.
* 1
[Context] ----------------- [Data]
1 *
[Problem Solution] -------- [Data]
The first view if the basis of the RDB paradigm -- generic storage of
the same data for access by many different contexts. The second view is
the one that is relevant for solving large problems -- access of data
that is carefully tailored to the problem in hand. Storing and
accessing data for many different contexts is a quite different problem
than formatting and manipulating data to solve a specific problem.

Again, DB's are not JUST for "storage". There are RAM-only RDBMS's.
They provide services for "attribute modeling" for lack of a better
term. They are essentially attribute modeling tools, not disk managers.
OO'ers would rather reinvent their own attribute managers with
different charracteristics for each implementation.

RDBMS simply "encapsulate" commonly-used attribute management idioms
into a standard (or at least semi-standard).

Post by H. S. Lahman
For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Reformatting? Please clarify.

Post by H. S. Lahman
SQL does support abstraction in the sense that it abstracts the RDB
implementation. However, here I was referring to problem space abstraction.

When the only tool you have is a Hammer, then everything looks like a
Nail.

No, out of necessity I started my career without DB usage, and I never
want to return there.

Post by H. S. Lahman
The FP people will be happy to tell you that they can solve those
problems quite elegantly without any state variables, much less tables
of data.

Well, I have asked for a practical demonstration in the past, and they
couldn't provide one that showed more than about a 5% difference,
making up pretty lame excuses in the process. I won't claim that my
favorite approaches are objectively better. However, they have not been
shown objectively worse.

Post by H. S. Lahman
Similarly, the OO paradigm allows one to solve those problems
with quite different abstractions.
[FYI, the most comprehensive book on building UML Class Diagrams is Leon
Starr's "Executable UML: How to Build Class Models". In that book Leon
repeatedly uses tables as analogues for validating class instances. In
effect he is using them as a tool for normalization. But the
identification of classes and properties is all done in a OO fashion via
abstraction for the problem space. More important, the management of
relationships and collaborations is very different than the RDB paradigm
(e.g., one almost never selects from a collection of all instances and
the notion of a multi-table join is virtually nonexistent in an OO
application). IOW, there is a mapping to tables but the construction
paradigm is not at all driven by a table view.]

How is replacing a table view with a class view better? Sure, schema's
are messy at many shops, but given time the OO'ers will probably have
messy classes also. The same motiviation (or lack of) for slop is part
of human or management nature, not part of the paradigm. No paradigm
has been shown to FORCE good practices. At least relational provides
tools to keep schemas fairly clean (normalization rules). OO has no
decent normalization model, tacking on more and more classes and
methods like a shanty town living for the moment (out of necessity
perhaps).

Regarding joins, some dialects of SQL allow tables to be optionally
joined without explicitly mentioning the relationships. I agree that
SQL has many flaws. But it still beats the existing alternatives.

I don't think that is a very good analogy. A compiler/interpreter
applies exactly the same set of rules to transforming the input model to
machine language on the platform de jour. IOW, the same
compiler/interpreter will do exactly the same thing in any shop having
the same hardware for any customers.
The issue here is letting the DBMS apply business rules and policies
that are specific to a particular customer or, worse, that are specific
to a particular problem. Then one is not separating concerns and one is
bleeding cohesion; the penalty is a high cost in maintenance.

I would like to see an example. "Separation of concerns" is why data
issues are managed differently than behavior issues, by the way. You
are narrowing your usage of "separation" to your pet approaches.

You see, data is more "mathable" than behavior at this point in
history. Behavior has resisted attempts to have useful and consistent
idioms applied to it. Relational reigns in the pointer-like spehggetti
of OO and UML but putting at least the data side into a standard
attribute processing and management system. OO/UML drag both data and
behavior back to the navigational spehgetti dark ages of the 1960's. It
drags software down to the yet-to-be-mathed behavior realm.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)

Also, part of the reason for starting out with reports etc. is that it
takes a while to learn the domain. Interface-level stuff (screens and
reports) can be learned from semi-generic courses and books. The
specific business rarely has good documentation and must be learned by
*exposure*. Starting out with UI allows one time to get such exposure.
There is no "domain school", or "Learn Bob's Kite Business in 21 Days"
books.

It is largely a matter of where training comes easiest from.

Post by H. S. Lahman
In CRUD/USER processing the customer is the one actually solving
problems by interpreting the data. The developer is just formatting the
data to support the customer. Replace printouts with GUIs or browsers
and the customer is still solving the real problem.

To some extent, yes, but users are usually crappy at factoring such
that if you take everything they say litterally you will end up with
duplication of concepts and info up the wazooo. Part of an analyst's
job is to clearify the biz rules and then educate the customer about
the benefits, changes, options, and corrections that normalization will
require.

Post by H. S. Lahman
It is only when the
problem solution gets drawn into the software that one leaves the realm
of CRUD/USER processing and thing start to get tricky.

Post by H. S. Lahman
Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

That is called "reuse".

Post by H. S. Lahman
They are part of
the predictable collection of forensic ploys you use when debating OO
people. It's all designed to have an emotional effect to put the
opponent on tilt.
You seem to get your amusement out of having OO people go bonkers.

I will admit there is a certain satisfaction of using other people's
own logic against themselves, especially if they have insulted me
prior.

Post by H. S. Lahman
Seeing how far down the rabbit hole you can drag them is a sport to you.
You even keep trying to get me involved in such debates, even though I
insist on not biting. Do you think that I haven't noticed that this
subthread started off as a discussion about DBMS vs. application
responsibilities but you keep trying to push it into an OO debate? The
entire paragraph in question was a blatant attempt to deflect from the
DBMS issues to OO issues.

OO'ers in generally do not like RDBMS and would rather wrap them
with OO wrappers so that they don't have to deal with them directly.
But this is not abstraction, it is translating horizontally from what
you hate to what you like. In their head RDBMS are (incorrectly) viewed
as low-level
assembler-like tools.

I am simply correcting this myth. They are a very powerful abstraction
and modeling tool IF you know how to use them right. OO is in many ways
lower level because it does not include many DB idioms and reinvents
them the hard way from scratch each time they are needed. Putting
long-named methods around everything does not improve abstraction, only
creates beurocracy.

Post by H. S. Lahman
BTW, I would and have said similar things to various bosses.

I don't know whether to label that as brave or foolish.

Post by H. S. Lahman
Note that UML static diagrams also implement the RDM
but they do so in a very different way than the RDBs do.

UML takes us back to the navigational/CODYSAL pointer/path hell that
proved a mess in the 60's and 70's. UML and OO is the structural GO TO
of the modern age.

You are so cute when you paint me as bad, manipulative, and evil.

You are spreading falsehoods about RDBMS. They are NOT low-level. You
only treat them as low level.

Otherwise, show me with code examples that they are inharently
low-level. Why should anybody take your word for it? Show me the code!
(or UML) Enough of this brochure-talk.

Show me UML/OO being better. Show me how wrapping everything under the
sun with long-named methods/functions, even if referenced only once,
makes for a better system.

Your bashing of relational is evidence-free so far.

Post by H. S. Lahman
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman

-T-

H. S. Lahman

2006-01-17 18:55:57 UTC

Responding to Jacobs...

Post by H. S. Lahman
The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

Of course it's an implementation! It implements access to physical
storage.

Just as Java implements access to physical RAM etc.

Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

4GLs get around it because they are independent of /all/ computing space
implementations.

However, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL. SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself. So one can't even use it for general
purpose access to persistence, much less general computing.

Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.

My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
OOA models have no difficulty recognizing implementation pollution.

All compilers generate object code (relocatable Assembly). Most modern
interpreters can produce storable bytecodes that are equivalent to
Assembly from the VM's viewpoint. At run time one can view an
interpreter as simply combining link and load functions that transform
the bytecode to a machine instruction. But at some level the
interpreter still has to understand that MUL,R1,R2 maps into bits.

But you reverting to ploys again by deflecting. The context is
specification vs. implementation, not how machine instructions are encoded.

Try using SQL vs. flat files if you think it is independent of the
actual storage mechanism. (Actually, you probably could if the flat
files happened to be normalized to the RDM, but the SQL engine would be
a doozy and would have to be tailored locally to the files.) SQL
implements the RDB view of persistence and only the RDB view.

Post by topmind
method getGreenScarvesCostingLessThan100dollars(...) {
sql = "select * from products where prod='scarves' and color='green'
and price < 100"
return(foo.execute(sql))
}
You simply wrap everything with long method names and call it
"abstraction" and pat yourselves on the back. However, it is not really
abstraction because often they are called from only ONE place. That
just contributes to code bloat and red-tape-code.

Come on, Bryce, you know this ploy isn't going to work on me. B-) You
create a non sequitur example that ignores OO relationship navigation
principles and follow it up with a set of outlandish assertions about
what OO is about. I refuse to join you in your rabbit hole.

You could be right, but I have yet to see a good case outside of
split-second timing issues where there is a limit to the max allowed
response time. (This does not mean that rdbms are "slow", just less
predictable WRT response time.)
If you can give an example outside of timing, please do. (I don't doubt
they exist, but I bet they are rarer than you imply. Some scientic
applications that use imaginary numbers and lots of calculus may also
fall outside.)

Compute a logarithm. You can't hedge by dismissing "scientific"
computations. Try doing forecasting in an inventory control system w/o
"scientific" computations. Or try encoding the pattern recognition that
the user of a CRUD/USER application applies to the presented data. The
reality is that IT is now solving a bunch of problems that are
computationally intensive.

Well, UML *is* language. It is a visual language just like LabView is.

Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets.

For the zillionth time, RDBMS are far more than just "persistence".

It is only if one refuses to manage complexity by separating logical
concerns.

You know very well what I mean by 'separation of concerns' in a software
context, so don't waste our time recasting it. Modularity has been a
Good Practice since the late '50s.

Post by topmind
This is also why UML sucks: Either you have a jillion diagrams of the
same thing in order to provide all the potential viewpoints that
different users and developers will need, or you have to force a
limited taxonomy on developers. UML does not provide this
virtualization capability.

Again with the baseless assertions about what OO and OOA/D is about. I
have to give you an A for persistence. B-))

Again, DB's are not JUST for "storage". There are RAM-only RDBMS's.

I agree they are used that for more, but it is not my problem if
developers are determined to shoot themselves in the foot by bleeding
cohesion all over the place. It is plain bad software practice to
ignore logical modularity.

As far as RAM RDBs go, for any large non-CRUD/USER problem I can
formulate a solution (which doesn't have to be OO) that will beat your
RAM RDB for performance, and often by integer factors. The RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.

Before you argue that the RAM RDB saves developer effort because it is
largely reusable and that may be worth more than performance, I agree.
But IME for /large/ non-CRUD/USER problems the computer is usually too
small and performance is critical.

[I could also argue that an OO solution will provide one with optimum
performance "for free" because it falls out of basic OOA/D for the
solution logic. IOW, one doesn't need that sort of reuse. But I won't
argue that because that would be going down the rabbit hole. B-)]

Post by H. S. Lahman
For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Reformatting? Please clarify.

The solution needs a different view of the data that is tailored to the
problem in hand. So the RDB view needs to be converted to the solution
view (and vice versa). IOW, one needs to reformat the RDB data
structures to the solution data structures.

Post by H. S. Lahman
I am talking about the abstracting the domain where the original problem
exists rather than the computing domain where a software solution will
be executed. SQL only abstracts a very narrow part of the computing domain.

When the only tool you have is a Hammer, then everything looks like a
Nail.

No, out of necessity I started my career without DB usage, and I never
want to return there.

That's because you are in a CRUD/USER environment where P/R works quite
well. Try a problem like allocating a fixed marketing budget to various
national, state, and local media outlets in an optimal fashion for a
Fortunate 500.

You never give up. It's not about 'better'. Nor is this subthread
about OO vs. DBMSes. So I am not going to bite on these outrageous
assertions either. No normalization model? Get serious.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)

Why? That has nothing to do with whether a DBMS should execute dynamic
business rules and policies. This isn't an OO vs. P/R discussion, much
as you would like to make it so.

Post by H. S. Lahman
It is only when the
problem solution gets drawn into the software that one leaves the realm
of CRUD/USER processing and thing start to get tricky.

Post by H. S. Lahman
Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

That is called "reuse".

LOL. I assume you mean reuse of forensic ploys.

I will admit there is a certain satisfaction of using other people's
own logic against themselves, especially if they have insulted me
prior.

That doesn't answer why you went to the trouble of creating an
inflammatory website and have been here for years. A simple dislike of
OO? I don't think so. How many converts have you made to justify your
crusade? It just wouldn't be worth the effort of beating your head
against the wall all these years. So you have to have some other
reason. The only plausible reasons I see are Quixotic masochism or you
enjoy pulling people's chains.

As far as insulting you is concerned, what do you expect? You throw out
inflammatory statements, especially misconceptions about what OO
development is about, that are designed to drive anyone who understands
OO up a tree. If I used my knowledge of OO and tried to design a
website that would drive OO people to outrage, it would be your
geocities website. It pushes all the buttons in admirable fashion.
(That you can push all the right buttons is what makes me believe you
actually understand a lot more about OO than you pretend; it would be
difficult to be so inflammatory without that knowledge.) So I have to
conclude it is intentional. When you jump up and down on the bellows
long enough, you will get burned.

Still pushing for a change of venue. This subthread has nothing to do
with OO development so I won't go there.

Post by H. S. Lahman
BTW, I would and have said similar things to various bosses.

I don't know whether to label that as brave or foolish.

Indispensable works for me. B-)

Post by topmind
You are so cute when you paint me as bad, manipulative, and evil.

You are spreading falsehoods about RDBMS. They are NOT low-level. You
only treat them as low level.

Where did I say that? I said that once one is out of the realm of
CRUD/USER processing, /talking/ to persistence is a low level service
_within the application_. How persistence is implemented outside the
application is a whole other story.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-18 08:12:32 UTC

Post by H. S. Lahman
Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

In that case is OO design the abstraction and UML the implementation.

Post by H. S. Lahman
SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself.

Since when is the relational model a "model of persistence". Can you
provide any pointer showing that the relational model is supposed to be
a "persistence model".

Post by H. S. Lahman
Try using SQL vs. flat files if you think it is independent of the
actual storage mechanism. (Actually, you probably could if the flat
files happened to be normalized to the RDM, but the SQL engine would be
a doozy and would have to be tailored locally to the files.) SQL
implements the RDB view of persistence and only the RDB view.

Yes, the files need to be normalized to the RDM but why do you make the
conclusion that SQL needs a RDB?

Post by H. S. Lahman
Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

Sounds a little bit like Java and the JVM....

Post by H. S. Lahman
As far as RAM RDBs go, for any large non-CRUD/USER problem I can
formulate a solution (which doesn't have to be OO) that will beat your
RAM RDB for performance, and often by integer factors. The RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.

For any problem I can formulate an C++ solution that will beat your
Java solution for performance. But I will still continue to use Java...

Fredrik Bertilsson
http://butler.sourceforge.net

H. S. Lahman

2006-01-18 16:21:13 UTC

Responding to Frebe...

Post by H. S. Lahman
Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

In that case is OO design the abstraction and UML the implementation.

Up to a point. UML is just a notation, not a design paradigm so it does
not implement OOA/D. However, UML is just one of many OOA/D notations
proposed, so it is an implementation of an OOA/D notation. Thus in
OMG's MOF meta hierarchy UML is, indeed, and implementation and OMG
describes it exactly that way. (In fact, OMG uses four levels of
abstraction to describe the hierarchy of UML itself and three of those
are implementations of the higher level meta-models.)

Post by H. S. Lahman
SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself.

Since when is the relational model a "model of persistence". Can you
provide any pointer showing that the relational model is supposed to be
a "persistence model".

I didn't say that. The RDM is a model of static data so it can be
applied to UML Class Diagrams as well. Note that I was careful to say
that SQL is a solution to persistence /access/ when the data is
represented in RDB form. As you have pointed out elsewhere one could
create a RAM-based RDB and use SQL to access it with no persistence.

Yes, the files need to be normalized to the RDM but why do you make the
conclusion that SQL needs a RDB?

SQL requires the data to be in tables and tuples with embedded identity.
The RDM, when applied in a broader context than Codd, does not require
that. SQL also assumes a very specific paradigm for navigating table
relationships.

Post by H. S. Lahman
Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

Sounds a little bit like Java and the JVM....

Not quite. Choosing Java is already an implementation decision in the
computing environment. More important, one can't do a lot of useful
stuff in a Java program without explicitly invoking particular computing
space technologies technologies (XML, EJB, TCP/IP, etc.). That's
because the level of abstraction of 3GLs is the same as those computing
space technologies.

For any problem I can formulate an C++ solution that will beat your
Java solution for performance. But I will still continue to use Java...

I can't buy the analogy. The analogy compares performance across
different implementations of the same solution. My point here is that
the solution itself is inappropriate, regardless of the language that it
is implemented in. No matter what language you choose I can always
provide an alternative /solution/ that will have better performance for
a specific complex problem.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-19 17:53:17 UTC

Post by H. S. Lahman
SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself.

Since when is the relational model a "model of persistence". Can you
provide any pointer showing that the relational model is supposed to be
a "persistence model".

You agree that the relational model is not only about persistence? But
why is the SQL language limited to only the persistence features of the
relational model?

Yes, the files need to be normalized to the RDM but why do you make the
conclusion that SQL needs a RDB?

SQL requires the data to be in tables and tuples with embedded identity.

SQL does not requires the data to be in tables. The data may reside in
flat files or RAM structures. Just because the SQL language uses the
keyword "table" does not actually mean that it need to be backed up by
a physical table. SQL is an interface, remember?

Post by H. S. Lahman
The RDM, when applied in a broader context than Codd, does not require
that. SQL also assumes a very specific paradigm for navigating table
relationships.

I think you are trying to hijack relational theory here. Do you have
any pointers to this second definition of relational theory?

Fredrik Bertilsson
http://butler.sourceforge.net

H. S. Lahman

2006-01-19 20:22:35 UTC

Responding to Frebe...

Post by H. S. Lahman
SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself.

Since when is the relational model a "model of persistence". Can you
provide any pointer showing that the relational model is supposed to be
a "persistence model".

You agree that the relational model is not only about persistence? But
why is the SQL language limited to only the persistence features of the
relational model?

As my last sentence indicates, SQL is not limited to persistence.
However, that is probably where 99.99% of the usage lies.

Yes, the files need to be normalized to the RDM but why do you make the
conclusion that SQL needs a RDB?

SQL requires the data to be in tables and tuples with embedded identity.

SQL is specifically designed for the RDB implementation paradigm of the
RDM. If you want to use SQL for flat files, those files will have to be
especially formatted (e.g., with embedded identity keys) and normalized.
You could develop a SQL driver to use file names as table identity and
read lines via an implied line number as a key, but good luck on
correctly dealing with line insertions and deletions without an embedded
key.

Post by H. S. Lahman
The RDM, when applied in a broader context than Codd, does not require
that. SQL also assumes a very specific paradigm for navigating table
relationships.

I think you are trying to hijack relational theory here. Do you have
any pointers to this second definition of relational theory?

The RDM is basic set theory. Codd was explicitly dealing with
persistence in a computing environment so he expressed the rules in
terms of embedded identity attributes (keys). However the set theory
only requires that each tuple have unique identity. Similarly, Codd was
only dealing with data properties but there is nothing in the underlying
set theory to preclude behavior properties, so his view represents a
specialization. Thus you will see a discussion of normalization of
Class Models in most standard OOA/D books (e.g., "Executable UML" by
Mellor and Balcer pg. 77, which happened to be the first one I pulled
off my bookshelf).

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-22 10:33:40 UTC

Post by H. S. Lahman
SQL is not limited to persistence.

Finally we can agree about something. Does this mean that you will stop
making this claim?

Post by H. S. Lahman
However, that is probably where 99.99% of the usage lies.

I suppose that you are talking about your usage of SQL. In an average
enterprise application, non-persitence features like queries,
transactions, referential integrity, caching, etc, are heavily used.

Post by H. S. Lahman
SQL is specifically designed for the RDB implementation paradigm of the
RDM.

Because you have created a new definition of the term RDM, that is
different from Codd's definition, the distinction between RDB and RDM
is your own invention.

Post by H. S. Lahman
You could develop a SQL driver to use file names as table identity and
read lines via an implied line number as a key, but good luck on
correctly dealing with line insertions and deletions without an embedded
key.

Why would line number be the key?

Post by H. S. Lahman
The RDM is basic set theory.

Are you saying that the RDM is based on basic set theory or that the
RDM is nothing more than basic set theory?

Post by H. S. Lahman
Codd was explicitly dealing with
persistence in a computing environment so he expressed the rules in
terms of embedded identity attributes (keys). However the set theory
only requires that each tuple have unique identity.

In what way does emedded identity attributes limits Codd's RDM to
persistence? The option appear to be pointers, which was used in
network databases. Emedded keys vs pointers is orthogonal to persistent
vs in-memory.

Post by H. S. Lahman
Thus you will see a discussion of normalization of
Class Models in most standard OOA/D books

I was asking for a definition of the second definition of the RDM,
broader than Codd's definition. I was not asking for discussions about
class model normalization.

Fredrik Bertilsson
http://butler.sourceforge.net

H. S. Lahman

2006-01-22 17:13:33 UTC

Responding to Frebe...

Post by H. S. Lahman
SQL is not limited to persistence.

Finally we can agree about something. Does this mean that you will stop
making this claim?

I never made that claim.

Post by H. S. Lahman
However, that is probably where 99.99% of the usage lies.

If they are using SQL for that in a non-CRUD/USER application for
anythign other than persistence access, then they are misusing SQL.
Even in a CRUD/USER application it doesn't make much sense from a
performance viewpoint if the data is in memory.

Post by H. S. Lahman
SQL is specifically designed for the RDB implementation paradigm of the
RDM.

Because you have created a new definition of the term RDM, that is
different from Codd's definition, the distinction between RDB and RDM
is your own invention.

Codd's definition /is/ the RDB view; it is a specialized application of
more general set theory...

Why would line number be the key?

How else would you uniquely identify each line for individual access in
a text file?

Post by H. S. Lahman
The RDM is basic set theory.

Are you saying that the RDM is based on basic set theory or that the
RDM is nothing more than basic set theory?

The RDM is a combination of basic set theory and predicate logic that
deals with relational calculus using terminology like relation, tuple,
and attribute. Codd's data model is an application of the RDM that
deals with relational algebra using terminology like table, row, and
field (see his original 1970 paper, "A Relational Model of Data in Large
Shared Data Banks", ACM Communications, 13, pgs. 377-387 where he
introduced the notion of representing data in tables).

While Codd was the first to provide a formal and consistent view of the
RDM, the RDM itself has been greatly expanded over the years beyond the
RDB view. Today it can be applied to such disparate arenas as OO
development and OODBs...

In what way does emedded identity attributes limits Codd's RDM to
persistence?

It doesn't. But Codd's goal was to describe persisted data and he
developed the initial view of the RDM around the notion of RDB storage.
Just look at the titles of Codd's early books and papers and try to
convince me that his research wasn't /focused/ on RDBs and persistence:

A Relational Model of Data for Large Shared Data Banks, 1970

Normalized Data Base Structure: A Tutorial, 1971

A Data Base Sublanguage Founded on the Relational Calculus, 1971

Further Normalization of the Data Base Relational Model, 1972

Relational Completeness of Data Base Languages, 1972

The Gamma-0 n-ary Relational Data Base Interface Specifications of
Objects and Operations, 1973

Recent Investigation in Relational Data Base Systems, 1974

Implementation of Relational Data Base Management Systems, 1975

He was a researcher in IBM's hard disk division, for Pete's sake!

Post by H. S. Lahman
Thus you will see a discussion of normalization of
Class Models in most standard OOA/D books

I was asking for a definition of the second definition of the RDM,
broader than Codd's definition. I was not asking for discussions about
class model normalization.

For starters, try set theory. Though I am not a fan, you might also
look at Chris Date's work for descriptions of the RDM well beyond the
RDB view.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-24 06:28:46 UTC

Post by H. S. Lahman
Codd's definition /is/ the RDB view; it is a specialized application of
more general set theory...

Search for "relational model" at wikipedia and you will find no support
for this statement. There seem to be a conensus that Codd is the
creator of the relational model. I think it is important that we try to
keep to generally accepted definitions and not try to invent own
definitions to supports our claims.

Post by frebe
Why would line number be the key?

How else would you uniquely identify each line for individual access in
a text file?

By a separate index file. Otherwise I would need a linear search to
find the records/lines I am looking for.

Post by frebe
In what way does emedded identity attributes limits Codd's RDM to
persistence?

It doesn't.

Good. But why do you use this argument for claiming the opposite?

Post by H. S. Lahman
Just look at the titles of Codd's early books and papers and try to

I don't find any persistence related word in these titles. It is only
in the OO world that Data Base == persistence. Outside this
subcommunity databases are used for much more, which I have already
many times pointed out. Search for "database" at wikipedia and count
the number of times the word "disk" or "hard drive" appear. Personally
I like the opening statement as a definition of database. "A database
is an organized collection of data."

Post by H. S. Lahman
He was a researcher in IBM's hard disk division, for Pete's sake!

Does this makes his research invalid in other areas? Many research
results have proven to be useful in completly other areas. Does the
fact that Ivar Jacobson has a background in the telecom industry, makes
his work invalid in other areas?

Post by H. S. Lahman
you might also look at Chris Date's work for descriptions of the RDM well beyond
the RDB view.

Below is one quote from a article from Chris Date:
"In the first installment in this series, I said I expected database
systems still to be based on Codd's relational foundation a hundred
years from now. And I hope you can see, from what we've covered over
the past few months, why I believe such a thing. The relational
approach really is rock solid, owing (once again) to its basis in
mathematics and predicate logic. "

I don't think this support your claim that Codd's definition(s) is only
a subset of the relational model.

Fredrik Bertilsson
http://butler.sourceforge.net

H. S. Lahman

2006-01-24 17:02:18 UTC

Responding to Frebe...

Post by H. S. Lahman
Codd's definition /is/ the RDB view; it is a specialized application of
more general set theory...

As I said elsewhere in the message, Codd was the first to provide a
formal model using existing set theory and predicate logic. However,
his goal was to describe a specific data storage mechanism -- the RDB.
The RDM has been greatly expanded since to be applied in other contexts.

Post by frebe
Why would line number be the key?

How else would you uniquely identify each line for individual access in
a text file?

By a separate index file. Otherwise I would need a linear search to
find the records/lines I am looking for.

Exactly my point. You are adding another file to form a database that
is consistent with the RDB view. How do you do it without augmenting
the stored data?

Post by frebe
In what way does emedded identity attributes limits Codd's RDM to
persistence?

It doesn't.

Good. But why do you use this argument for claiming the opposite?

I never said one can't apply the RDB model to transient data. Quite the
contrary, at least three times in this thread, including right here, I
have said one could. And I have said repeatedly that OO Class Models
are normalized to the RDM.

This is the third time in this thread where you have deliberately
misrepresented me with a when-did-you-stop-beating-your-wife ploy. Ta-ta.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-24 18:41:56 UTC

Post by H. S. Lahman
How else would you uniquely identify each line for individual access in
a text file?

By a separate index file. Otherwise I would need a linear search to
find the records/lines I am looking for.

Exactly my point. You are adding another file to form a database that
is consistent with the RDB view. How do you do it without augmenting
the stored data?

Flat files can only be used for linear traversal unless you have some
sort of external index. Using it the "OO" way would not make any
difference.

Fredrik Bertilsson
http://butler.sourceforge.net

Alfredo Novoa

2006-01-25 12:44:50 UTC

Post by H. S. Lahman
However, that is probably where 99.99% of the usage lies.

In well designed Information Systems all the business rules are
enforced by the DBMS.

Referential integrity is only a little part of data integrity, and all
data integrity must be enforced by the DBMS.

Regards

topmind

2006-01-21 19:06:43 UTC

Post by H. S. Lahman
The point is that there are alternative /implementations/ for
persistence to RDBs in the computing space. SQL has already made that
implementation choice.

Of course it's an implementation! It implements access to physical
storage.

Just as Java implements access to physical RAM etc.

Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

Why do you keep saying "persistence"? I don't think you get the idea of
RDBMS and query languages. Like I said, think of a RDBMS as an
"attribute management system". Forget about disk drives for now. Saying
it is only about "persistence" is simply misleading.

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

4GLs get around it because they are independent of /all/ computing space
implementations.

I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.

Post by H. S. Lahman
However, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL.

Again, this gets into the definiton of "general purpose". I agree that
query languages are not meant to do the *entire* application, but that
does not mean it is not general purpose. File systems are "general
purpose", but that does not mean that one writes an entire application
in *only* a file system. It is a general purpose *tool*, NOT intended
to be the whole enchilata.

A hammer is a general purpose tool, but that does not mean one is
supposed to ONLY use a hammer. You need to clarify your working
definition of "general purpose", and then show it the consensus
definition for 4GL.

Post by H. S. Lahman
SQL represents a
solution to persistence access that is designed around a particular
model of persistence itself. So one can't even use it for general
purpose access to persistence, much less general computing.

Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.

My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
OOA models have no difficulty recognizing implementation pollution.

Argument by authority.

Post by H. S. Lahman
All compilers generate object code (relocatable Assembly). Most modern
interpreters can produce storable bytecodes that are equivalent to
Assembly from the VM's viewpoint. At run time one can view an
interpreter as simply combining link and load functions that transform
the bytecode to a machine instruction. But at some level the
interpreter still has to understand that MUL,R1,R2 maps into bits.
But you reverting to ploys again by deflecting. The context is
specification vs. implementation, not how machine instructions are encoded.

You have not finished your analogy on the 3G and 4G side. Besides,
analogies often make poor evidence, being better for teaching or
illuminating.

How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.

And as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.

You could be right, but I have yet to see a good case outside of
split-second timing issues where there is a limit to the max allowed
response time. (This does not mean that rdbms are "slow", just less
predictable WRT response time.)
If you can give an example outside of timing, please do. (I don't doubt
they exist, but I bet they are rarer than you imply. Some scientic
applications that use imaginary numbers and lots of calculus may also
fall outside.)

Compute a logarithm. You can't hedge by dismissing "scientific"
computations.

I didn't. Nothing is ideal for everything under the sun. Nothing. See
above about general-purpose tools.

Post by H. S. Lahman
Try doing forecasting in an inventory control system w/o
"scientific" computations.

I am not sure what you are implying here. I did not claim that
scientific computation was not necessary.

Post by H. S. Lahman
Or try encoding the pattern recognition that
the user of a CRUD/USER application applies to the presented data. The
reality is that IT is now solving a bunch of problems that are
computationally intensive.

As usual, "it depends". Problems where there is a lot of "chomping" on
a small set of data are probably not something DB's are good at (at
this time). An example might be the Travaling Salesman puzzle. However,
problems where the input is large and from multiple entities are more
up the DB's alley.

(It may be possible to use a DB to solve Salesmen quickly, but few
bother to research that area.)

Well, UML *is* language. It is a visual language just like LabView is.

Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

So can any Turing Complete language.

It is not the same thing at all. The 4GL solution does not care if
persistence is /implemented/ with RDBs, OODBs, flat files, paper files,
or clay tablets.

For the zillionth time, RDBMS are far more than just "persistence".

It is only if one refuses to manage complexity by separating logical
concerns.

You know very well what I mean by 'separation of concerns' in a software
context, so don't waste our time recasting it. Modularity has been a
Good Practice since the late '50s.

If there is only one concern set where each concern is mutually
exclusive, then we have no disagreement. In practice there are usually
multiple "partioning" candidates, and that is where the disagreements
usually arise. File and text systems don't make it easy to have
partitioning in all dimensions, so compromises must be made. It is "my
factor is more important than your factor, neener neener". If there is
only one way to slice the pizza, then there is no problem. But if there
are multiple ways, then a fight breaks out.

This is one reason why DB's are useful: the more info you put into the
DB instead of code, the more ad-hoc, situational partitionings you can
view. You are not forced to pick the One Right Taxonomy of
partitioning. Categorizational philosphers came to the consensus that
there is no One Right Taxonomy for most real-world things.

Again, DB's are not JUST for "storage". There are RAM-only RDBMS's.

Again, in practice there are multiple incompatable modularity
candidates in non-trivial software. Life if multi-dimensional, and the
more complex the software the more factors there are.

Change impact analysis often does not help either because I found out
that people perceive change and change probabilities different. It is
hard to plan for change when people don't perceive the future the same.

Claims claims claims. Yaawwwn.

Post by H. S. Lahman
The RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.

I think what you view as context-dependent is not really context
dependent after all. It is just your pet way of viewing the world
because of all the OOP anti-DB hype.

Post by H. S. Lahman
Before you argue that the RAM RDB saves developer effort because it is
largely reusable and that may be worth more than performance, I agree.
But IME for /large/ non-CRUD/USER problems the computer is usually too
small and performance is critical.

Please clarify. Ideally the RDBMS would determine what goes into RAM
and what to disk such that the app developer doesn't have to give a
rat's rear. Cache management generally does this, but a both-way system
is probably not as fast as a dedicated RAM DB. Even if the two-way
ideal is not fully reached, one will soon have the *option* to switch
some or all of an app to a full-RAM DB as needed without rewriting the
app. The query language abstracts/encapsulates/hides that detail way.

Post by H. S. Lahman
[I could also argue that an OO solution will provide one with optimum
performance "for free" because it falls out of basic OOA/D for the
solution logic. IOW, one doesn't need that sort of reuse. But I won't
argue that because that would be going down the rabbit hole. B-)]

No, it often hard-wires in the early usage paths such that future usage
paths that go against those early paths turn into a mess. OO tends to
be really lousy at many-to-many relationships, for example.

Post by H. S. Lahman
For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Reformatting? Please clarify.

This is called a "result set" or "view". Most queries customize the
data to a particular task. Thus, it *is* a solution view.

When the only tool you have is a Hammer, then everything looks like a
Nail.

No, out of necessity I started my career without DB usage, and I never
want to return there.

Again, I never said that DB's are good for every problem. I don't know
enough about that particular scenario to propose a DB-centric solution
and to know whether it is an exception or not.

Unless you provide some specific use-case or detailed sceneria, it is
anecdote against anecdote here.

RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
gigantic.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)

Why? That has nothing to do with whether a DBMS should execute dynamic
business rules and policies. This isn't an OO vs. P/R discussion, much
as you would like to make it so.

Are you saying it is a UML-versus-RDB debate?

Post by H. S. Lahman
It is only when the
problem solution gets drawn into the software that one leaves the realm
of CRUD/USER processing and thing start to get tricky.

Post by H. S. Lahman
Unfortunately, I agree with May that the rest of the paragraph makes no
sense; it just seems to be your personal jargon and mantras.

They are part of
the predictable collection of forensic ploys you use when debating OO
people. It's all designed to have an emotional effect to put the
opponent on tilt.
You seem to get your amusement out of having OO people go bonkers.

I will admit there is a certain satisfaction of using other people's
own logic against themselves, especially if they have insulted me
prior.

I get enough "amen brother's" to provide all the social satisfaction I
need from it.

Post by H. S. Lahman
It just wouldn't be worth the effort of beating your head
against the wall all these years. So you have to have some other
reason. The only plausible reasons I see are Quixotic masochism or you
enjoy pulling people's chains.

Perhaps an Asperger's Syndrome: obsession with a specific narrow topic.
Whatever, if you want to sit around and speculate on my motivation, be
my guest. Frankly, I am not that important to waste time on.

Post by H. S. Lahman
As far as insulting you is concerned, what do you expect? You throw out
inflammatory statements, especially misconceptions about what OO
development is about, that are designed to drive anyone who understands
OO up a tree. If I used my knowledge of OO and tried to design a
website that would drive OO people to outrage, it would be your
geocities website. It pushes all the buttons in admirable fashion.
(That you can push all the right buttons is what makes me believe you
actually understand a lot more about OO than you pretend; it would be
difficult to be so inflammatory without that knowledge.) So I have to
conclude it is intentional. When you jump up and down on the bellows
long enough, you will get burned.

Whatever. If OO was truely great you could demonstrate it with a coded
business example that many if not most OO proponents would agree is
good OO. You can't. BilliOOns of dOOllars spent based on anecdotes,
bragging, and brochure-talk.

[....]

Post by topmind
You are so cute when you paint me as bad, manipulative, and evil.

You are spreading falsehoods about RDBMS. They are NOT low-level. You
only treat them as low level.

We'll, we both agree that DB's are not for everything. However, we
disagree widely on where the limits lay.

RDBMS tend to not be the right tool where performance, hardware
packaging, or timing is more critical than change management. If
something changes often, then a RDBMS is a more general-purpose
solution. This is not to say that RDBMS are slow, they just will not be
competitive with a critial system designed for a very specific,
slow-changing purpose. But for the budget-minded who don't want to
build low-level tools from scratch and want flexibility, DB's are the
way to go.

I believe most cases where DB's are not appropriate for the application
will fall into the above category.

Post by H. S. Lahman
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman

-T-

H. S. Lahman

2006-01-22 19:57:40 UTC

Responding to Jacobs...

Post by topmind
SQL is not an implementation. What is the difference between locking
yourself to SQL instead of locking yourself to Java? If you want
open-source, then go with PostgreSQL. What is the diff? Java ain't no
universal language either.

Of course it's an implementation! It implements access to physical
storage.

Just as Java implements access to physical RAM etc.

Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

Persistent data is data that is stored externally between executions of
an application. RDBs are a response to that need combined with a
requirement that access be generic (i.e., the data can be accessed by
many different applications, each with unique usage contexts). That's
what DBMSes do -- they manage persistent data storage and provide
generic, context-independent access to that data storage.

My point in this subthread is that such responsibilities are complicated
enough in practice that one does not want the DBMS to also manage and
execute dynamic business rules and policies. IOW, the DBMS should just
mind its own store. [This thread has been a veritable hotbed of puns.
I've probably made more in this thread than I've done in the last
decade. B-)]

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

4GLs get around it because they are independent of /all/ computing space
implementations.

I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.

A UML OOA model can be implemented unambiguously and without change in a
manual system. In fact, that is a test reviewers use to detect
implementation pollution. The OOA model for, say, a catalogue-driven
order entry system will look exactly the same whether it is implemented
as a 19th century mail-in Sears catalogue or a modern broswer-based web
application. That is not true for any 3GL.

Post by H. S. Lahman
However, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL.

Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.

Post by topmind
A hammer is a general purpose tool, but that does not mean one is
supposed to ONLY use a hammer. You need to clarify your working
definition of "general purpose", and then show it the consensus
definition for 4GL.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

SQL is designed around the RDB paradigm for persistence. It can't be
used for, say, accessing lines in a text flat file because the text file
is not does organize the data the way SQL expects. So SQL is not a
general purpose interface to stored data. Apropos of your point,
though, SQL is quite general purpose for accessing /any/ data in a
uniform way from a data store _organized like an RDB_.

Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.

My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
OOA models have no difficulty recognizing implementation pollution.

Argument by authority.

I prefer to think of it as argument by rational practicality.

How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.

There is a distinction between describing an interface and designing its
semantics. UML is quite capable of describing the semantics of any
interface. Deciding what the semantics should be is quite another thing
that the developer owns.

When I have a subsystem in my application to access persistent data,
that subsystem has an interface that the rest of the application talks
to. That interface is designed around the rest of the application's
data needs, not the persistence mechanisms. It is the job of the
persistence access subsystem to convert the problem solution's data
needs into the access mechanisms de jour.

If the persistence is an RDB, then the subsystem implementation will
<probably> use SQL. If the persistence is flat text files, it will use
the OS file manager and streaming facilities. If it is clay tablets, it
will use an OCR and stylus device driver API. That allows me to plug &
play the persistence mechanisms without touching the application
solution because it still talks to the same interface regardless of the
implementation of the subsystem.

IOW, the semantics of the interface to the subsystem is /designed/ at a
different level of abstraction than that of the subsystem
implementation. UML doesn't care about the design process; it just
represents the results.

Post by topmind
And as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.

Only if the data is organized around embedded identity and normalized.
Even then such drivers carry substantial overhead and tend to be highly
tailored to specific applications. IOW, you need a different driver for
every context (e.g., a spreadsheet) and then it won't be as efficient as
an access paradigm designed specifically for the storage paradigm.

You could be right, but I have yet to see a good case outside of
split-second timing issues where there is a limit to the max allowed
response time. (This does not mean that rdbms are "slow", just less
predictable WRT response time.)
If you can give an example outside of timing, please do. (I don't doubt
they exist, but I bet they are rarer than you imply. Some scientic
applications that use imaginary numbers and lots of calculus may also
fall outside.)

Compute a logarithm. You can't hedge by dismissing "scientific"
computations.

I didn't. Nothing is ideal for everything under the sun. Nothing. See
above about general-purpose tools.

Post by H. S. Lahman
Try doing forecasting in an inventory control system w/o
"scientific" computations.

I am not sure what you are implying here. I did not claim that
scientific computation was not necessary.

I was just anticipating your deflection; you've been using the
give-me-an-example ploy for years. B-) When the example is provided
you deflect by attacking it on grounds unrelated to the original point.
That's usually easy to do because examples are deliberately kept
simple to focus on the point in hand. That allows you to bring in
unstated requirements, programming practices designed for other
contexts, and whatnot to attack the example on grounds unrelated to the
original point. In this case, though, you screwed up by setting up a
basis for the deflection ahead of time.

You asked for an example outside of "timing". The main reason SQL isn't
a general purpose 3GL is that it can't handle dynamics (algorithmic
processing) very well. So the obvious examples are going to tend to be
algorithmic, such as computing a logarithm. But your parenthetical
hedge set up a basis for dismissing any obvious example as "scientific"
when you subsequently deflect. Then later you can argue the point was
never demonstrated.

The Traveling Salesman problem can be arbitrarily large and the RDB
model will still probably not be useful because...

<aside>
FYI, most of the Operations Research algorithms are actually pretty
simple when written out in equations and the core processing doesn't
require a lot of code. Typically most of the code is involved in
getting the data into the application, setting up data structures, and
reporting the results. In addition, the interesting problems are huge
and involve vast amounts of data.

For example, the logistics for the '44 D-Day invasion of Normandy held
the record as the largest linear programming problem ever solved well
into the '70s. The equations for the Simplex solution were written in a
few lines but the pile of data processed was humongous and the actual
execution took months. (It had to be split up into many chunks because
of the MTTF of the computer hardware and a lot of preprocessing was done
by acres of clerks with hand-cranked calculators.)
</aside>

Post by topmind
(It may be possible to use a DB to solve Salesmen quickly, but few
bother to research that area.)

Unlikely. It's an np-Complete problem so the worst case always involves
an exhaustive search of all possible combinations (i.e., O(N*N)). The
exotic algorithms just provide /average/ performance that approaches
O(NlogN). But those algorithms require data structures that are highly
tailored to the solution. And because of the crunching one wants
identity in the form of array indices, not embedded in tables or the
problem doesn't get solved in a lifetime.

Well, UML *is* language. It is a visual language just like LabView is.

Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

So can any Turing Complete language.

And your point is...?

On separation of concerns of problem solving dynamics vs. data

Post by topmind
"Separation" is generally irrelavent in cyber-land. It is a phsycial
concept, not a logical one. Perhaps you mean "isolatable", which can be
made to be dynamic, based on needs. "Isolatable" means that there is
enough info to produce a seperated *view* if and when needed. This is
the nice thing about DB's: you don't have to have One-and-only-one
separation/taxonomy up front. OO tends to want one-taxonomy-fits-all
and tries to find the One True Taxonomy, which is the fast train the
Messland. Use the virtual power of computers to compute as-need
groupings based on metadata.

You know very well what I mean by 'separation of concerns' in a software
context, so don't waste our time recasting it. Modularity has been a
Good Practice since the late '50s.

If there is only one concern set where each concern is mutually
exclusive, then we have no disagreement. In practice there are usually
multiple "partioning" candidates, and that is where the disagreements
usually arise. File and text systems don't make it easy to have
partitioning in all dimensions, so compromises must be made. It is "my
factor is more important than your factor, neener neener". If there is
only one way to slice the pizza, then there is no problem. But if there
are multiple ways, then a fight breaks out.
This is one reason why DB's are useful: the more info you put into the
DB instead of code, the more ad-hoc, situational partitionings you can
view. You are not forced to pick the One Right Taxonomy of
partitioning. Categorizational philosphers came to the consensus that
there is no One Right Taxonomy for most real-world things.

There are three accepted criteria for application partitioning (i.e.,
separating concerns at the scale of subsystems): Subject matter, level
of abstraction, and requirements allocation via client/service
relationships. (BTW, this has nothing to do with OO; it is basic
Systems Engineering.)

Subject matter: Clearly static data storage and providing generic access
to it is a different subject matter than solving Problem X.

Level of abstraction: Outside CRUD/USER processing the detailed
manipulation of data storage (e.g. ,two-phased commit) is clearly at a
much lower level of abstraction than the algorithmic processing the
solves a particular problem. IOW, the application solution is
completely indifferent to where and how data is stored. One should be
able to solve the problem the same way regardless of what the
persistence mechanisms are. That substitutability means that the
problem solution is at a higher level of abstraction than the
persistence mechanisms.

Requirements Allocation: Clearly the requirements for persistence
implementation and access are quite different than the requirements on
the specific solution of Problem X.

So under all three of these criteria it makes sense to separate the
concerns of persistence from individual problem solutions. That's
exactly what DBMSes do. The problems only come into play when one
violates that separation of concerns and starts bleeding specific
problem solutions into the DBMS itself.

Post by H. S. Lahman
The RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.

I think what you view as context-dependent is not really context
dependent after all. It is just your pet way of viewing the world
because of all the OOP anti-DB hype.

My view of context-dependence is the solution to a /particular/ problem.
Each application solves a unique problem. IOW, the problem is the
context. RDBs provide persistence that allows all the applications to
access the data in a uniform way regardless of what specific problem
they are solving.

Whether one can solve the problem in a reasonable fashion with the data
structures mapped to the RDB structure depends on the nature of the
problem. For CRUD/USER processing one can. For problems outside that
realm one can't so one needs to convert data into structures tailored to
the problem in hand.

[Note that this is relevant to the point above about providing SQL
drivers for different storage paradigms. That makes sense for CRUD/USER
environments because one is already employing SQL as the norm. So long
as the exceptions requiring a special driver are fairly rare, one can
justify the single access paradigm. However, it makes no sense at all
for non-CRUD/USER environments because one has to reformat the data to
the problem solution anyway. So rather than reformatting twice, one
should just reformat once from a driver that optimizes for the storage
paradigm.]

This is another non sequitur deflection. Caching and whatnot is not
relevant to the point I was making. There is business a trade-off
between run-time performance and developer development time that every
shop must make. Sometimes greater developer productivity can justify
reusing the RDB paradigm when more efficient specific solutions are
available.

However, my point was that those situations tend to map to CRUD/USER
processing. Once problems become more complex than format conversions
in UI/DB pipeline applications, performance becomes the dominant
consideration. I spent years solving large np-Complete problems on
machines like PDP11s and there was no contest on that issue; customers
simply would not spring for Crays in their systems but they would spring
for a marginal extra developer cost prorated across all systems.

Post by H. S. Lahman
For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Reformatting? Please clarify.

This is called a "result set" or "view". Most queries customize the
data to a particular task. Thus, it *is* a solution view.

That formatting is cosmetic. The most sophisticated reformatting is
combing data from multiple tables in a join into a single table dataset.
I am talking about data structures whose semantics are different,
whose access paradigms are different, whose relationships are different,
and/or whose structure is different. IOW, there isn't a 1:1 mapping to
the RDB. For example, if my solution requires the data to be organized
hierarchically SQL queries can't do that.

Post by H. S. Lahman
When the only tool you have is a Hammer, then everything looks like a
Nail.

No, out of necessity I started my career without DB usage, and I never
want to return there.

Again, I never said that DB's are good for every problem. I don't know
enough about that particular scenario to propose a DB-centric solution
and to know whether it is an exception or not.
Unless you provide some specific use-case or detailed sceneria, it is
anecdote against anecdote here.
RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
gigantic.

Of course they are. They provide a generic, context-independent access
to stored data that any application can use. That's why they exist.
But that is beside the point.

The issue here is where individual business problems should get solved.
My assertion is that is an application responsibility. For CRUD/USER
processing one can use the same data structures in the solution as in
the RDB so P/R as a software development paradigm works well.
Generally, though, one can't use the same data structures once one is
out of the CRUD/USER realm so P/R doesn't work very well.

Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.

Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)

Why? That has nothing to do with whether a DBMS should execute dynamic
business rules and policies. This isn't an OO vs. P/R discussion, much
as you would like to make it so.

Are you saying it is a UML-versus-RDB debate?

Another deflection. How do you get from how complex report generation
software is to UML vs. RDB? The topic here has nothing to do with OO,
P/R, or UML. It is about the complexity of processing for CRUD/USER
applications vs. other applications.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

topmind

2006-01-23 06:08:52 UTC

(Part 1 of reply)

Of course it's an implementation! It implements access to physical
storage.

Just as Java implements access to physical RAM etc.

Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.

I disagree. You can use them that way, but I tend to view them as an
"attribute management system". They do well modelling "things" in the
real world (and virtual things) by keeping track of attributes of them.
They also provide important and useful services such as concurrency
management, joins (cross-referencing), sorting, and aggregation (sums,
counts, averages, etc.) Mere persistence does NOT have to include
things such as joins and aggregation, making one do them in app code
instead.

Back in my desktop-DB days, I created a lot of temporary tables to do
things such as joins, filtering, and aggregation for task-specific
temporary uses. The results were not kept beyond the task/module. Thus,
I was using DB tools *without* any sense of "lasting".

What would *you* call that? "Persistence" does not apply there.

Post by H. S. Lahman
My point in this subthread is that such responsibilities are complicated
enough in practice that one does not want the DBMS to also manage and
execute dynamic business rules and policies. IOW, the DBMS should just
mind its own store. [This thread has been a veritable hotbed of puns.
I've probably made more in this thread than I've done in the last
decade. B-)]

I agree that code does some things better and DB other things, and one
uses them *together* in a Yin-Yang fashion. They compliment each other.

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

4GLs get around it because they are independent of /all/ computing space
implementations.

I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.

One can execute Java code by hand also. If you follow the spec, it
should always come out the same. (There may be minor vendor differences
due to errors or fuzzy areas in the spec, but this is true of any
non-trivial tech language, including UML.)

Post by H. S. Lahman
However, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL.

Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

Okay, then what is a "general purpose tool"? If I was going to put
together a tool box for a trip where the mission details are not given
ahead of time, I would certainly pack a hammer. Only an idiot would
not. No, it is not a one-size-fits-all tool, and I don't expect one.
Good apps don't need a one-size-fits-all language because they can use
yin-yang complimentary tools.

Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

Well, I agree that SQL is probably not a very good way to reference
free-form text. However, just because it is not good for everything
does not mean it is not general purpose. Again, NOTHING is good at
EVERYTHING. Do you claim that there is something that is good at
everything? No? I didn't think so.

By the way, I have created tables similar to this:

table: textFile
--------------
fileID
ParagraphID
SentenceID
token (word or punctuation)
tokenType (punctuation, word, non-printable, etc.)

(Non-printable characters are represented in Hex notation.)

It can be done.

Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.

My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
OOA models have no difficulty recognizing implementation pollution.

Argument by authority.

I prefer to think of it as argument by rational practicality.

I didn't see it *here*.

How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.

I could wrap SQL calls in functions, like that PinkScarf example I
already gave. But unless the same kind of thing is called multiple
times, it is just code bloat. Query languages can be compact in many
circumstances. Putting bloated single-use wrappers around it gets you
nothing except more code. Thus, I *can* play the same game via function
wrappers if need be.

Post by H. S. Lahman
If the persistence is an RDB, then the subsystem implementation will
<probably> use SQL. If the persistence is flat text files, it will use
the OS file manager and streaming facilities. If it is clay tablets, it
will use an OCR and stylus device driver API. That allows me to plug &
play the persistence mechanisms without touching the application
solution because it still talks to the same interface regardless of the
implementation of the subsystem.

Flat files don't have near the power. That is a poor analogy. RDBMS are
MORE than persistence. Say it over and over until it clicks in. Just
because YOU use it ONLY for persistence does not make it the only way
to build systems, just the bloated reinvent-the-wheel way JUST so that
you can claim UML purity.

Post by H. S. Lahman
IOW, the semantics of the interface to the subsystem is /designed/ at a
different level of abstraction than that of the subsystem
implementation.

Bull. UML is often a LOWER level of abstraction because it can take
much more code/language to specify the same thing.

Post by H. S. Lahman
UML doesn't care about the design process; it just
represents the results.

Yeah right. Logic languages, like Prolog, once claimed the same thing
(and in some senses were right, but programmer productivity or code
size was not objectiviely reduced). UML is a visual language just like
any other language, including the likes of Lab View. Language ==
Language. UML is not magic. You got nothing new.

Post by topmind
And as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.

So? What is the grand alternative? Of course different "devices" are
going to need different drivers. That is a given. There is no
one-size-fits-all driver. I have no idea what alternative you are
envisioning, but it is probably in the category with unicorns and
bigfoot.

(end of part one)

Christian Brunschen

2006-01-23 10:37:48 UTC

Post by topmind
(Part 1 of reply)

Post by H. S. Lahman
Persistent data is data that is stored externally between executions of
an application. RDBs are a response to that need combined with a
requirement that access be generic (i.e., the data can be accessed by
many different applications, each with unique usage contexts). That's
what DBMSes do -- they manage persistent data storage and provide
generic, context-independent access to that data storage.

I disagree. You can use them that way, but I tend to view them as an
"attribute management system". They do well modelling "things" in the
real world (and virtual things) by keeping track of attributes of them.
They also provide important and useful services such as concurrency
management, joins (cross-referencing), sorting, and aggregation (sums,
counts, averages, etc.) Mere persistence does NOT have to include
things such as joins and aggregation, making one do them in app code
instead.
Back in my desktop-DB days, I created a lot of temporary tables to do
things such as joins, filtering, and aggregation for task-specific
temporary uses. The results were not kept beyond the task/module. Thus,
I was using DB tools *without* any sense of "lasting".
What would *you* call that? "Persistence" does not apply there.

You are still using a persistence mechanis; you're just choosing to not
use it for persistence. A file system is a persistence mechanism; lots of
applications put data into temporary files, which they delete after
they're done with them, sometimes because the available storage in the
filesystem is larger than in main memory (Photoshop, I believe, used to do
that, and maybe still does). This non-persistent use of the filesystem
doesn't make a filesystem any less of a persistence mechanism.

The relational model, and SQL, were developed specifically for persistent
databases. You can use them in a non-persistent manner, but that is
essentially using them contrary to their original intent and purpose.

By the way, how do you create and destroy these temporary tables? i.e.,
does the dbms manage their lifecycle for you, creating them as necessary
and removing them when you no longer need them, or do you have to perform
either or both of those steps yourself?

I agree that code does some things better and DB other things, and one
uses them *together* in a Yin-Yang fashion. They compliment each other.

Relational Databases and SQL are tools for the specific task of storing,
accessing, modifying data - they are single-purpose tools (see definitions
below). They are 'general-purpose' _within their specific area_, as they
are not specific to any particular data access/storage/manipulation tasks,
but they are 'general-purpose' _only_ within their _specific_ domain,
which is the storage of data. A 'general-purpose' programming language is
one that allows one to write solutions to essentially arbitrary problems
using it, possibly with some specific exceptions (such as, 'python is a
general-purpose programming language, but due to its interpreted nature,
it shouldn't be used for writing interrupt handlers').

One thing to remember is that a RDBMS does _not_ do _anything_ that one
can't do in code on one's own - they are essentially just a pre-written
library, with a little domain-specific language as part of its interface -
whereas on the other hand, _most_ of the things you can fo in _code_,
_cannot_ be done in an RDBMS.

[ ... deletia, as I know nothing about UML ... ]

Post by topmind
Again, this gets into the definiton of "general purpose". I agree that
query languages are not meant to do the *entire* application, but that
does not mean it is not general purpose. File systems are "general
purpose", but that does not mean that one writes an entire application
in *only* a file system. It is a general purpose *tool*, NOT intended
to be the whole enchilata.

Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

Okay, then what is a "general purpose tool"? If I was going to put
together a tool box for a trip where the mission details are not given
ahead of time, I would certainly pack a hammer.

Yes, but you wouldn't expect to be using the hammer _unless_ you
encountered a problem that _specifically_ included nails. If you
encountered problems that included only screws, you'd never touch it;
you'd be using your screwdriver set instead.

Post by topmind
Only an idiot would
not. No, it is not a one-size-fits-all tool, and I don't expect one.
Good apps don't need a one-size-fits-all language because they can use
yin-yang complimentary tools.

*sigh* Time for some rudimentary definitions:

single-purpose
useful for a single purpose, for a specific task, only

multi-purpose
useful for a number of specific tasks, but only a limited number still

general-purpose
useful for most tasks in general, though possibly with some specific
exceptions

all-purpose
useful for absolutely everything

A hammer is _not_ a 'general-purpose tool', because it isn't intended
or useful for general tasks, but specifically for beating nails into
stuff. If it also has a back end that lets you pry nails out, it might be
considered a 'multi-purpose' tool, but even that would be a stretch,
because you're still only working on the 'nails' bit. Either way, it's not
a 'general-purpose' tool by any stretch of the mind.

Actually, there's a saying that's apt, and which I think that 'topmind'
very clearly exemplifies:

"If all you have is a hammer, everything looks like a nail"

The point of which, of course, is precisely that a hammer is _not_ a
general-purpose tool, but that if you are accustomed to working only with
a specific limited toolset (and thus the associated set of problems it is
intended to solve), there is a tendency to try to see all other problems
as if theyy, too, were problems of that specific type. However, that is a
fallacy, as there are problems which clearly don't fit into such a narrow
mold.

Even topmind's own comments above show that he is fundamentally aware of
this: he says that a hammer is _one of_ the tools he would pack, so he
recognizes that there are many more tasks that he might encounter, but for
which a hammer is not a useful tool. But he confuses 'general-purpose'
with 'one-size-fits-all': Those are _not_ the same.

Of course, when it comes to computers, 'general-purpose' can frequently
come very _close_ to being 'all-purpose' simply because there are very few
problems that fall outside the 'general-purpose' area.

Java is a 'general-purpose' language, because you can write all sorts of
programms in it - from math-intensive scientific number-crunching, to data
storage and access, to graphical user interfaces, to distributed systems,
to ... etc. SQL _isn't_, because there are _vast_ areas of problems that
SQL not just osn't intended to address, but simply _cannot_ address,
however much you try to make it.

The need for 'yin-yang-complementary' tools arises in cases where it it
difficult to create multi-purpose or general-purpose tools: construction,
woodworking, metalworking etc, are all places where it is difficult to
create such tools. In computing, however, such tools abound: The computer
_itself_ is a gener-purpose tool. Let us not forget, in fact, that today's
digital stored-program computers are actually _general-purpose_ computers,
as ooposed to earlier, single-purpose computers, that were available (and
some still are!), such as for solving differential equations.

Procedural, funtional, object-oriented languages are all _general-purpose_
tools for programming computers, for writing essentially arbitrary
programs. SQL _isn't_. If it were, then why do database vendors create
languages to extend or 'hook into' the database (Oracle's PL/SQL), or
allow other languages to hook into the database (PostgreSQL allows perl
and python, as well as others, I believe)? There would be no need - *if*
SQL were a _general-purpose_ language - which it isn't.

And again, 3GL can be used to _write_ RDBMS; the converse is _not_ true.

Post by topmind
Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

While in computing, 'general-purpose' can quite frequently get reather
close to 'all-purpose' (i.e., the set of specific problems that a
'general-purpose' language cannot solve can grow very small), there is a
difference, yes; however, the difference between a single- or
multi-purpose tool and a general-purpose tool is _also_ still there.

Certainly, tools can be within a certain paradigm, and still be
general-purpose. For instance, object-oriented languages are within the
object-oriented paradigm, and are still general-purpose; likewise,
functional programming languages (like Haskell) are within the functional
paradigm, and are still definitelly general-purpose languages. However,
that is more because 'object-orientation' and 'functional programming' are
both paradigms specifically for general-purpose programming.

The relational data model is specifically intended for data storage and
access; it isn't intended to address anything beyond that. Any language
that is based on that specific model is going to be working within the
limits of what that model is intended to address. So, for instance, SQL is
great at modifying and accessing data in a relational database ... but is
not useful for anything else, because that is outside the scope of its
purpose.

So, whereas SQL is a 'general-purpose' language _within the scope of
database access_, it is a _single-purpose_ language if you view it from a
wider scope than that.

Post by H. S. Lahman
SQL is designed around the RDB paradigm for persistence. It can't be
used for, say, accessing lines in a text flat file because the text file
is not does organize the data the way SQL expects. So SQL is not a
general purpose interface to stored data. Apropos of your point,
though, SQL is quite general purpose for accessing /any/ data in a
uniform way from a data store _organized like an RDB_.

Actually, for programming _everything_, you can always use machine code.
After all, that is what everything else comes down to: Everything that
you can do on a computer, you can do in machine code.

In order for a programming language to be 'all-purpose', it would only
have to be able to express everything you can express in machine code.
There are a lot of languages that come close - those are usually
'general-purpose' languages.

SQL does _not_ come close, due to inherent limitations in SQL.

Post by topmind
table: textFile
--------------
fileID
ParagraphID
SentenceID
token (word or punctuation)
tokenType (punctuation, word, non-printable, etc.)
(Non-printable characters are represented in Hex notation.)
It can be done.

But then you have taken what was an arbitrary stream of characters and
already broken it up into a representation that better suits your specific
model. And I suspect that the code that would read an arbitrary stream of
characters and puts it into your tables, was written in something _other_
than SQL - because SQL is not good at the free-text processing that is
necessary to get the data into a tabular format, because SQL is not
general purpose.

[ ... more deletia regarding UML ... ]

Everything that an RDBMS does, can be implemented in a 3GL using flat
files for the low-level persistence layer. How do I know this? Because
that is one way to implement a RDBMS.

The relational model offers a useful, unified interface to data storage
and retrieval, with well-defined semantics - in such a way that it is a
very good solution to a specific problem; it is in fact, a
'general-purpose' solution _within its scope_ (much like a hammer is a
general-purpose tool for hitting things, and isn't inherently limited to
specific types of nails). But viewed from outside its scope, SQL only
addresses that particular problem, and thus isn't general-purpose.

The power of the combination of flat files and a 3GL, is greater than the
power of SQL - because the combination can implement all that SQL can,
_and much more besides_.

[ ... more UML deletia ... ]

Post by topmind
And as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.

And these 'drivers' would be written in ... ? something like a 3GL,
procedura, funtional or object-oriented, usually. You can't, for instance,
write a flat-file database driver in SQL, can you?

SQL, RDBMS, are all about offering a specific view of data, and a specific
standard interface to that data, to accessing it and modifiing it
(insertion, deletion, updates). This solves the 'data storage' problem for
a large number of aplications, and is, as such, a 'general-purpose data
storage' solution - but it addresses _only_ the data storage/retrieval
problem, whereas most other problems also have many other aspects that
need to be solved - which SQL can't do.

Best wishes,

// Christian Brunschen

topmind

2006-01-24 02:18:52 UTC

Post by topmind
(Part 1 of reply)
Back in my desktop-DB days, I created a lot of temporary tables to do
things such as joins, filtering, and aggregation for task-specific
temporary uses. The results were not kept beyond the task/module. Thus,
I was using DB tools *without* any sense of "lasting".
What would *you* call that? "Persistence" does not apply there.

You are still using a persistence mechanis; you're just choosing to not
use it for persistence.

Again, persistence is one of MANY features of a DB.

Post by Christian Brunschen
A file system is a persistence mechanism; lots of
applications put data into temporary files, which they delete after
they're done with them, sometimes because the available storage in the
filesystem is larger than in main memory (Photoshop, I believe, used to do
that, and maybe still does). This non-persistent use of the filesystem
doesn't make a filesystem any less of a persistence mechanism.

They usually do that because the RAM system is limited, not because
they want any special features of the the disk system.

Post by Christian Brunschen
The relational model, and SQL, were developed specifically for persistent
databases.

Because RAM was too expensive to consider non-persistence DB's back
then. The history is generally irrelavent. I don't care what they did
in 1965; It does not change what I do now. LISP was originally a
demonstration tool only. The author never set out to create a new
production language. But LISP fans are not going to be halted by
ancient intentions.

Post by Christian Brunschen
You can use them in a non-persistent manner, but that is
essentially using them contrary to their original intent and purpose.

DB's have a lot of features *besides* persistence, and I use DBs for
those features. Label DB's whatever you want, but I find those other
features useful. You guys are playing a label game. Focus on potential
usage, not your pet classification or what happened 50 years ago.

Post by Christian Brunschen
By the way, how do you create and destroy these temporary tables? i.e.,
does the dbms manage their lifecycle for you, creating them as necessary
and removing them when you no longer need them, or do you have to perform
either or both of those steps yourself?

Some table-oriented tools auto-delete them, some are RAM-only, while
others require manual intervention. (The ideal system would offer all
of these options.)

Post by topmind
I agree that code does some things better and DB other things, and one
uses them *together* in a Yin-Yang fashion. They compliment each other.

Bull.

Post by Christian Brunschen
A 'general-purpose' programming language is
one that allows one to write solutions to essentially arbitrary problems
using it, possibly with some specific exceptions (such as, 'python is a
general-purpose programming language, but due to its interpreted nature,
it shouldn't be used for writing interrupt handlers').

You mean "Turing Complete". We have been over this already. I don't
expect a query language to be TC to be considered "general purpose"
because one can and perhaps should use multiple tools such that one
should not EXPECT to have one tool/language do everything. Your "do
everything" assumption is a poor assumption.

Post by Christian Brunschen
One thing to remember is that a RDBMS does _not_ do _anything_ that one
can't do in code on one's own - they are essentially just a pre-written
library, with a little domain-specific language as part of its interface -
whereas on the other hand, _most_ of the things you can fo in _code_,
_cannot_ be done in an RDBMS.

Irrelavent. I don't assume a single tool has to carry the entire load
of an app.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

Okay, then what is a "general purpose tool"? If I was going to put
together a tool box for a trip where the mission details are not given
ahead of time, I would certainly pack a hammer.

Yes, but you wouldn't expect to be using the hammer _unless_ you
encountered a problem that _specifically_ included nails.

Not true. They are good for pounding things into place, pounding them
out of being stuck, prying things off (with the back end), etc.

Post by Christian Brunschen
If you
encountered problems that included only screws, you'd never touch it;
you'd be using your screwdriver set instead.

By that definition nothing can be "general purpose" because you are
incorrecting looking for an all-in-one tool. (A Swiss Army Knife is
actually a collection of many tools, not one tool.)

single-purpose
useful for a single purpose, for a specific task, only
multi-purpose
useful for a number of specific tasks, but only a limited number still
general-purpose
useful for most tasks in general, though possibly with some specific
exceptions

Okay, but even your own def does NOT require it to carry the entire
load, but simply be *used* for most things.

Post by Christian Brunschen
all-purpose
useful for absolutely everything
A hammer is _not_ a 'general-purpose tool', because it isn't intended
or useful for general tasks, but specifically for beating nails into
stuff.

I consider the equivalent of an app to be a "project", not a task. For
an entire project, such as building a car or a house, most likely you
will need a hammer at least once (even if you don't use nails).

Post by Christian Brunschen
If it also has a back end that lets you pry nails out, it might be
considered a 'multi-purpose' tool, but even that would be a stretch,
because you're still only working on the 'nails' bit. Either way, it's not
a 'general-purpose' tool by any stretch of the mind.

Is there such a thing WRT physical tools by your def?

Post by Christian Brunschen
Actually, there's a saying that's apt, and which I think that 'topmind'
"If all you have is a hammer, everything looks like a nail"

Like maybe.......Java?

Post by Christian Brunschen
The point of which, of course, is precisely that a hammer is _not_ a
general-purpose tool, but that if you are accustomed to working only with
a specific limited toolset (and thus the associated set of problems it is
intended to solve), there is a tendency to try to see all other problems
as if theyy, too, were problems of that specific type. However, that is a
fallacy, as there are problems which clearly don't fit into such a narrow
mold.

That can apply to anybody. If you are truly experienced such that you
tried them all, then you could point to multiple areas where RDB'S
stink.

Post by Christian Brunschen
Even topmind's own comments above show that he is fundamentally aware of
this: he says that a hammer is _one of_ the tools he would pack, so he
recognizes that there are many more tasks that he might encounter, but for
which a hammer is not a useful tool. But he confuses 'general-purpose'
with 'one-size-fits-all': Those are _not_ the same.

No, it appears to be you who is making that part of the definition.

Post by Christian Brunschen
Of course, when it comes to computers, 'general-purpose' can frequently
come very _close_ to being 'all-purpose' simply because there are very few
problems that fall outside the 'general-purpose' area.
Java is a 'general-purpose' language, because you can write all sorts of
programms in it - from math-intensive scientific number-crunching, to data
storage and access, to graphical user interfaces, to distributed systems,
to ... etc. SQL _isn't_, because there are _vast_ areas of problems that
SQL not just osn't intended to address, but simply _cannot_ address,
however much you try to make it.

Addressed above.

Post by Christian Brunschen
The need for 'yin-yang-complementary' tools arises in cases where it it
difficult to create multi-purpose or general-purpose tools: construction,
woodworking, metalworking etc, are all places where it is difficult to
create such tools.

Are you saying that in Java you don't need DB's because you can write
your own concurrency management, join systems, aggregation systems,
etc? Perhaps, but that just means that you reinvented the DB the hard
way, and a proprietary one that new comers have to learn from scratch.
Reuse is out the door.

[snip]

Post by Christian Brunschen
And again, 3GL can be used to _write_ RDBMS; the converse is _not_ true.

Irrelavent.

Post by Christian Brunschen
The relational data model is specifically intended for data storage and
access; it isn't intended to address anything beyond that.

If that was true, then they would be happy with a file system alone.
Besides, original intentions can mean squat, per above.

[snip]

[re: Text schema example]

Post by Christian Brunschen
But then you have taken what was an arbitrary stream of characters and
already broken it up into a representation that better suits your specific
model.

Nothing wrong with that.

Post by Christian Brunschen
And I suspect that the code that would read an arbitrary stream of
characters and puts it into your tables, was written in something _other_
than SQL - because SQL is not good at the free-text processing that is
necessary to get the data into a tabular format, because SQL is not
general purpose.

You mean not Turing Complete, per above. Something does not have to be
TC to be "general purpose". Another example would be Regular
Expressions. They are not TC, yet are a general purpose string
matching/finding tool. (Maybe not the best either, but that is another
story.)

[snip]

Post by Christian Brunschen
// Christian Brunschen

-T-

Christian Brunschen

2006-01-24 10:35:35 UTC

Post by topmind
What would *you* call that? "Persistence" does not apply there.

You are still using a persistence mechanis; you're just choosing to not
use it for persistence.

Again, persistence is one of MANY features of a DB.

Well, the features of a relational database are primarily that it allows
you to store data, organised as rows and columns in tables according (more
or less) to the relational model, and and it does so in a persistent
manner (what you put into the database isn't going to disappear unless you
explicitly remove it).

If we ignore the persistence aspect, what remains is the organization of
data according to the relational model. That's certainly useful, but it's
not 'MANY' features. It's a very useful one, true, but you are overstating
it a bit.

Nevertheless, persistence is considered one of the cornerstones of
RDBMS:es, and one thing that RDBMS:es are expected to offer.

They usually do that because the RAM system is limited, not because
they want any special features of the the disk system.

But the filesystem remains a persistence mechanism, even though it has
been used for its 'size' aspect rather than its 'persistence' aspect. So,
the mere fact that you can use a database in a non-persistent manner
doesn't make it any less of a persistence mechanism.

Post by Christian Brunschen
The relational model, and SQL, were developed specifically for persistent
databases.

Because RAM was too expensive to consider non-persistence DB's back
then.

Interesting assertion - do you have anything to back it up with? From
everything that I have read, it has been extremely clear that he
relational model was developed for _persistent_ databases, not for
_transint_ ones.

Post by topmind
The history is generally irrelavent.

I disagree: History is never entirely irrelevant, even if only to show
where something came from.

Post by topmind
I don't care what they did
in 1965; It does not change what I do now.

But it may change how one can _view_ and _describe_ what you do now, and
indeed, the tools you use.

Post by topmind
LISP was originally a
demonstration tool only. The author never set out to create a new
production language. But LISP fans are not going to be halted by
ancient intentions.

Yes, lots of things end up being used differently than originally
intended. However, if you look around, I think you will see that the
_vast_ majority of uses of databases are, in fact, for _persistent_
storage of data. So it's not just 'ancient history', it is also _current
usage_.

Post by Christian Brunschen
You can use them in a non-persistent manner, but that is
essentially using them contrary to their original intent and purpose.

Well, the potential usage of relational databases is the storage,
organization and access to (persistent or non-persistent) data. that still
doesn't solve the _vast_ majority of problems out there, because you
usually have to _do somethin_ with the data (process it somehow), which
SQL doesn't do.

Some table-oriented tools auto-delete them, some are RAM-only, while
others require manual intervention. (The ideal system would offer all
of these options.)

Cool.

Post by topmind
I agree that code does some things better and DB other things, and one
uses them *together* in a Yin-Yang fashion. They compliment each other.

Bull.

Ah, such eloquence, such precise refutation of the points I raise.

You mean "Turing Complete".

In a programming context, you can't get much more 'general-purpose' than
'turing complete'. But 'general-purpose' does not, per se, imply 'turing
complete': it simply implies that something is useful not for a specific
task, but for problems _in general_.

Turing-complete languages will be general-purpose. But there may well be
genera-purpose languages which aren't turing-complete.

Post by topmind
We have been over this already. I don't
expect a query language to be TC to be considered "general purpose"
because one can and perhaps should use multiple tools such that one
should not EXPECT to have one tool/language do everything. Your "do
everything" assumption is a poor assumption.

The 'do everythign' assumption may not sit well with you, but the fact is
that there are _lots_ of tools (programming languages) that _can_ 'do
everything' (inasmuch as it is possible on a computer), and that SQL isn't
one of them. More to the point, however, there are similarly _lots_ of
different paradigms (procedural, functional, object-oriented) that
likewise 'can do it all', but the relational model _also_ isn't one of
them. And SQL is limited _by design_ to _not_ address any aspects other
than the data storage and access ones, so it doesn't even attempt to be
anything othewr than a single-purpose language.

Irrelavent. I don't assume a single tool has to carry the entire load
of an app.

You're basicaly assuming, though, that you _need_ to combine different
tools. Well, that is true _if and only if_ you are focused on using at
least one tool that cannot solve the whole issue (such as, if you are
wedded to the idea of using a RDBMS). If you allow yourself to include in
your toolchest such tools that _can_ indeed address the while problem,
then you do not _need_ to combine different tools. (You may still choose
to, for a variety of reasons, but you do not _need_ to.)

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

Okay, then what is a "general purpose tool"? If I was going to put
together a tool box for a trip where the mission details are not given
ahead of time, I would certainly pack a hammer.

Yes, but you wouldn't expect to be using the hammer _unless_ you
encountered a problem that _specifically_ included nails.

Not true. They are good for pounding things into place, pounding them
out of being stuck, prying things off (with the back end), etc.

OK, that is just generalizing from 'nails' to 'something' - the point is
still that it is a tool specifically for the purpose of _hitting_
something. Actually, if you have a hammer with one of those prying
back-ends, you actually have a combination tool, where you've turned the
back end of the hammer into a part that is traditionally on a crowbar. The
proper 'hammer' bit - the front end - is intended for the purpose of
hitting things, with a suitable amount of stored energy (which is why a
hammer has a reasonably heavy head and a reasonably long handle).

Post by Christian Brunschen
If you
encountered problems that included only screws, you'd never touch it;
you'd be using your screwdriver set instead.

By that definition nothing can be "general purpose" because you are
incorrecting looking for an all-in-one tool. (A Swiss Army Knife is
actually a collection of many tools, not one tool.)

With woodworking, etc, it is indeed difficult to create 'general-purpose'
tools, precisely because there are so many different tasks there that may
need to be done, and for many of those tasks, specific, single-purpose
tools are the best things to do them. Fortunately, with computers, we have
an environment where we really _have_ general-purpose tools (programming
languages) - that is one thing that sets computers apart form many other
areas.

Okay, but even your own def does NOT require it to carry the entire
load, but simply be *used* for most things.

I actually wrote _useful_, not _used_, if you read carefully. And
'useful' implies 'able to address problems'. i.e., yes, to 'carry the
entire weight' if necessary.

I consider the equivalent of an app to be a "project", not a task.

Well, a 'project' falls apart into many 'tasks'.

Post by topmind
For
an entire project, such as building a car or a house, most likely you
will need a hammer at least once (even if you don't use nails).

... whereas for writing a large application, you might never need anything
beyond a single programming language, if that programming language is a
general-purpose one.

You're just showing that your 'hammer' analogy is falling apart, beceause
of the differences between physical tools and software development ones.

Is there such a thing WRT physical tools by your def?

As I mentioned above, in many physical disciplines, it is extremely
difficult to put together anything that is a general-purpose tool,
precisely because the tasks are so completely distinct.

Post by Christian Brunschen
Actually, there's a saying that's apt, and which I think that 'topmind'
"If all you have is a hammer, everything looks like a nail"

Like maybe.......Java?

My 'toolchest' includes C, C++, Objective-C, Logo, Pascal, Simula,
Smalltalk, Miranda, Haskell, Perl, Python and SQL (among other things).

That can apply to anybody. If you are truly experienced such that you
tried them all, then you could point to multiple areas where RDB'S
stink.

RDBs stink for writing flight simulators. Ever tried it?

And I'm not just talking about the lack of support for joysticks or 3-d
graphics. A flight simulator _could_ use a relational database to store
the state of the simulated world, etc, but the end up being written using
specialized data structures - because the relational data model doesn't
fit well with the way one would model a plane, the world it flies in, etc.
Yes, you _could_ store it in a relational database, but direct references
between different parts of the plane, the different parts that make up the
environment, etc, simply work better.

There are also many problems where the choice of using a RDBMS is made
simply because it is there and thus readily available, i.e., just for
reuse purposes, rather than because the data fits particularly well into
the relational model.

My main use of SQL and relational databases is actually for persistence -
often through an object-relational mapping tool (Apple's 'Enterprise
Objects', part of their WebObjects product). This allows me to use the key
strengths of RDMS:es - persistence, organization of data, and through SQL
a relatively standard interface - while also allowing me to use the
strengths of OO, for implementing the business logic for those objects
that are stored in the database. But make no mistake: The database is a
_useful_ part for storing the data, but it isn't a _necessary_ part. Other
persistence mechanisms could be used.

No, it appears to be you who is making that part of the definition.

Huh? You are the one who keeps trotting out the phrase 'one-size-fits-all'
whenever someone else mentions 'general-purpose'. I am trying to show that
they are _distinct_.

Addressed above.

Well, you've commented above, but not actually _addressed_ it.

Are you saying that in Java you don't need DB's because you can write
your own concurrency management, join systems, aggregation systems,
etc?

Exactly: You don't _need_ databases, per se, when using Java, because you
can write your own, if necessary. And indeed, for some aplications, you
can write something that will work _better_ for that application than
using a RDBMS.

Post by topmind
Perhaps, but that just means that you reinvented the DB the hard
way, and a proprietary one that new comers have to learn from scratch.
Reuse is out the door.

Or one could just use one that has already been written - in Java:

<http://db.apache.org/derby/>

The point isn't that one _should_ rewrite everything from scratch: The
point is that one _could_. And indeed sometimes there is value in it:
Derby is a reimplementation, from scratch, in Java, of a RDBMS. It is now
available to use in Java.,

Post by Christian Brunschen
And again, 3GL can be used to _write_ RDBMS; the converse is _not_ true.

Irrelavent.

Only to you.

Post by Christian Brunschen
The relational data model is specifically intended for data storage and
access; it isn't intended to address anything beyond that.

If that was true, then they would be happy with a file system alone.
Besides, original intentions can mean squat, per above.

What precisely besides data storage and access is the relational model
about? You should take note that I wasn't referring to persistence above.
It's certainly not about computations or business logic or user
interaction, for instance ...

Post by Christian Brunschen
But then you have taken what was an arbitrary stream of characters and
already broken it up into a representation that better suits your specific
model.

Nothing wrong with that.

But it shows that you cannot use a relational database alone as a
replacement for text files: You also need some form of parser to separate
the text file into parts that you can put into the relational database.

You mean not Turing Complete, per above.

Not necessarily - please don't put words into my mouth. The point is that
SQL was never intended to work with arbitrary streams of characters - it
was intended to work with arbitrary tables (and rows and columns).

Post by topmind
Something does not have to be
TC to be "general purpose". Another example would be Regular
Expressions. They are not TC, yet are a general purpose string
matching/finding tool. (Maybe not the best either, but that is another
story.)

Um, no. You wrote 'general purpose string matching/finding tool': Even by
your own writing they are 'general purpose' _only_ within their specific
domain, 'string matching/finding' - in exactly the same way as SQL is
'general purpose' _only_ within its own domain, 'accessing data in a
relational database'.

The point here isn't turing-completeness, actually: it's that some things
only apply to _specific_ domains (such as regular expressions, which apply
to the domain of strings; and SQL, which applies to the domain of
relational databases), whereas other things are defined to apply _in
general_ (such as object-orientation, functional programming, procedural
programming, etc). This is a very important distinction.

Post by topmind
-T-

// Christian Brunschen

Oliver Wong

2006-01-24 16:40:39 UTC

Post by topmind
The history is generally irrelavent.

I disagree: History is never entirely irrelevant, even if only to show
where something came from.

topmind claimed history is *generally* irrelevent, not *entirely*
irrelevant. "If only to show where something came from" does not contradict
with "generally irrelevant".

Post by Christian Brunschen
Turing-complete languages will be general-purpose. But there may well be
genera-purpose languages which aren't turing-complete.

The "infinitely long ribbon of 1s and 0s that Turing Machines read" is a
turing-complete programming language, but I don't think it's a
"general-purpose" language. AFAIK, their purpose has thus far been limited
to proving theorems (about the Turing Completeness, or lack thereof, of
other abstractions, for example).

So just because a language is TC does not nescessarily imply that it is
general purpose, in the sense that this thread seems to be using "general
purpose".

- Oliver

frebe

2006-01-24 18:34:26 UTC

Post by Christian Brunschen
If we ignore the persistence aspect, what remains is the organization of
data according to the relational model. That's certainly useful, but it's
not 'MANY' features.

Here are my top four non-persistence related features that I use every
day. I think they are very important.
* Queries.
* Transactions.
* Referential integrity
* Caching.

Post by Christian Brunschen
Nevertheless, persistence is considered one of the cornerstones of
RDBMS:es, and one thing that RDBMS:es are expected to offer.

Do do you have anything to back it up with?

Post by Christian Brunschen
However, if you look around, I think you will see that the
_vast_ majority of uses of databases are, in fact, for _persistent_
storage of data.

Only in the OO world. In the rest of the world there are many examples
of the opposite.

Fredrik Bertilsson
http://butler.sourceforge.net

Christian Brunschen

2006-01-24 19:24:59 UTC

Post by Christian Brunschen
If we ignore the persistence aspect, what remains is the organization of
data according to the relational model. That's certainly useful, but it's
not 'MANY' features.

Here are my top four non-persistence related features that I use every
day. I think they are very important.
* Queries.

Those I'd generally consider to be part of the relational model - the
queries are how you access data, perform joins (which are part of the
relational model), etc.

Post by frebe
* Transactions.

Those are definitely useful, but are not specific to databases, neither
in general nor specifically relational ones.

Post by frebe
* Referential integrity

... has to be maintained somehow: dangling references would be a problem
in any system. SQL can prevent you from putting the database into an
inconsistent state, true, but you still have to manually keep the
references up-to-date, just as in any other system.

Post by frebe
* Caching.

Caching also isn't specific to RDBMS:es. If you're using a modern OS, you
can hardly _avoid_ some level of caching these days, even if you're using
flat files!

So, all of these things you mention above are, in my view, either direct
consequences of the relational model, or features that are not specific to
relational databases (and could thus be available to you using a
non-relational system for storing and accessing data). Do you strongly
disagree?

Post by Christian Brunschen
Nevertheless, persistence is considered one of the cornerstones of
RDBMS:es, and one thing that RDBMS:es are expected to offer.

Do do you have anything to back it up with?

Not really - only my general experience with databases, which I admit is
not _hugely_ extensive. I am willing to be corrected on this point :) Do
you have any examples of relational databases that have specific features
for non-persistent usage? All the relational databases I've looked at
(again, a limited number) appear to put a lot of weight on the persistence
aspect.

Post by Christian Brunschen
However, if you look around, I think you will see that the
_vast_ majority of uses of databases are, in fact, for _persistent_
storage of data.

Only in the OO world. In the rest of the world there are many examples
of the opposite.

I'm open to be educated on the subject - please, could you point me at
some examples?

[ In my experience, even when I was developing procedural systems, in
those systems, relational databases where used to work with persistent
data. Transient data was generally stored in bespoke data structures in
memory. This is without any OO involved. ]

Post by frebe
Fredrik Bertilsson
http://butler.sourceforge.net

Best wishes,

// Christian Brunschen

frebe

2006-01-25 07:11:09 UTC

Post by frebe
* Queries.

Those I'd generally consider to be part of the relational model

Correct.

Post by frebe
* Transactions.

Those are definitely useful, but are not specific to databases

Correct.

Post by frebe
* Referential integrity

... has to be maintained somehow: dangling references would be a problem
in any system

Post by frebe
* Caching.

Caching also isn't specific to RDBMS:es.

Correct.

My point is that these features are useful non-persistence features
provided by a DBMS.

You claimed: "If we ignore the persistence aspect, what remains is the
organization of data according to the relational model. That's
certainly useful, but it's not 'MANY' features. "

I claim that a (R)DBMS provide MANY useful non-persistence related
features. I don't claim that a RDBMS is the only product that may
provide such features, but currently, for an average enterprise
application, a RDBMS is the best availible product to provide these
features.

Post by Christian Brunschen
Do you have any examples of relational databases that have specific features
for non-persistent usage? All the relational databases I've looked at
(again, a limited number) appear to put a lot of weight on the persistence
aspect.

I just gave you four examples. If you are asking for a RDBMS product
that is suitable for all-in-RAM use, look at hsqldb.

Post by Christian Brunschen
However, if you look around, I think you will see that the
_vast_ majority of uses of databases are, in fact, for _persistent_
storage of data.

Only in the OO world. In the rest of the world there are many examples
of the opposite.

I'm open to be educated on the subject - please, could you point me at
some examples?

If you look at enterprise applications outside the OO world, you will
find that they heavily use embedded SQL. Instead of loading the data
into memory structures, a select statement is used everytime some data
is needed. The RDBMS is configured to cache most of the data needed,
into memory. It means that the application asks the RDBMS for data that
resides in RAM. In this case, the persistence features is not involved
at all.

Another example is the use of transactions. This feature is not related
to persistence and enterprise applications uses them a lot.

Post by Christian Brunschen
In my experience, even when I was developing procedural systems, in
those systems, relational databases where used to work with persistent
data. Transient data was generally stored in bespoke data structures in
memory. This is without any OO involved.

This is true for some kind of applications, but normally not for
enterprise applications. Look at an old COBOL program. How advanced are
the data structures in COBOL? Almost the only thing you can do is to
traverse an array. All other kind of searched has to be done using a
select statement. Still COBOL was a very popular language, so the
concept with letting the DB take care of the collections handling was
not probably a very bad idea.

The concept with loading data into advanced structures instead of
making select calls, was originally caused by performance reason.
Currently I work with a scheduling application there I have to use this
concept. The result is bloated and messy code using TreeMap, HashMap,
ArrayList, etc, and I every minute I wish I could made a select
statement instead. But the time overhead with the interprocess
communications is simply too high. (One solution would indeed be to use
hsqldb as an all-in-RAM, in-process DB). But I strongly argue for just
using this this concept when performance reason force you to do. Using
select statements will give you much less bloated code.

Fredrik Bertilsson
http://butler.sourceforge.net

Christian Brunschen

2006-01-25 09:53:41 UTC

Post by frebe
* Queries.

Those I'd generally consider to be part of the relational model

Correct.

Post by frebe
* Transactions.

Those are definitely useful, but are not specific to databases

Correct.

Post by frebe
* Referential integrity

... has to be maintained somehow: dangling references would be a problem
in any system

Post by frebe
* Caching.

Caching also isn't specific to RDBMS:es.

Well, the above are 'a few' or even 'several', but not 'many'. One of them
- queries - is really a part of the relational model itself (whether you
use a query language or some other interface is immaterial). Transactions,
as offered by RDBMS:es, are limited to the data stored witin the RDBMS, so
that if you want to use its transaction capability, you need to store the
appropriate data in the RDBMS. Referential integrity, as I said, you still
have to maintain yourself: The RDBMS only catches and flags up errors (or
handles them in another way, which you must have told the RDBMS to use).
Caching is something that you get 'for free' in many other approaches as
well.

Post by frebe
I don't claim that a RDBMS is the only product that may
provide such features, but currently, for an average enterprise
application, a RDBMS is the best availible product to provide these
features.

Certainly a RDBMS gives you a useful collection of things to use; but if
you wanted to use one or maybe two of them _without_ wanting to use its
data storage model, then using the RDBMS won't help you, because it
doesn't offer transactions, caching, or referential integrity support for
anything other than what is handled within its data model.

So, if you want to use some of those features, without using the data
model, an RDBMS would most likely _not_ be the best choice.

I just gave you four examples.

None of those features are 'specific features for non-persistent usage':
caching, for instance, is probably more appropriate for persistent than
for non-persistent usage (if the entire thing is already in memory, what
needs to be cached?); both queries, transactions and referential integrity
are just as applicable to persistent as to non-persistent data, so also
are not 'non-persistent use specific'.

Post by frebe
If you are asking for a RDBMS product
that is suitable for all-in-RAM use, look at hsqldb.

Cool, I will. [ ... a quick look later ... ] Looks very interesting, and
indeed offers primarily memory-based tables (as well as 'cached' ones, for
datasets that need to persist, or simply exceed the size of available
memory). Thanks for the pointer!

Post by Christian Brunschen
However, if you look around, I think you will see that the
_vast_ majority of uses of databases are, in fact, for _persistent_
storage of data.

Only in the OO world. In the rest of the world there are many examples
of the opposite.

I'm open to be educated on the subject - please, could you point me at
some examples?

If you look at enterprise applications outside the OO world, you will
find that they heavily use embedded SQL.

Please, give me some more specific pointers.

Post by frebe
Instead of loading the data
into memory structures, a select statement is used everytime some data
is needed. The RDBMS is configured to cache most of the data needed,
into memory. It means that the application asks the RDBMS for data that
resides in RAM. In this case, the persistence features is not involved
at all.

So this would be entirely for data that is transient (i.e., the data in
that database is not deliberately kept around between executions)?

Post by frebe
Another example is the use of transactions. This feature is not related
to persistence and enterprise applications uses them a lot.

But neither are transactions, as offered by RDBMS:es, applicable to things
outside the RDBMS:s scope - i.e., outside the data in the RDBMS. So, if
all you want is transactions, an RDBMS probably shouldn't be the first
place to go.

This is true for some kind of applications, but normally not for
enterprise applications.

What precisely is your definition of an 'enterprise application'? I've
often thought of them as frequently working with large sets of data, often
data that is already in a database. Certainly in such a way that
temporary data is created that also needs to be managed, and because the
input and output may be coming from databases, using a database for the
temporary data would make an eminent amount of sense (keeps all the data
handling similar, reusable, regocnisable, easier to maintain); but from
your statement above it sounds like you would characterise enterprise
applications as using databases _not_ for incoming our outgoing data, but
_mainly_ for transient data used only in the process of whatever they are
doing?

Post by frebe
Look at an old COBOL program. How advanced are
the data structures in COBOL? Almost the only thing you can do is to
traverse an array. All other kind of searched has to be done using a
select statement. Still COBOL was a very popular language, so the
concept with letting the DB take care of the collections handling was
not probably a very bad idea.

So, databases were used to overcome the deficiencies in COBOL's support
for data structures? Cool, though I would class that as a workaround. Of
course, now we _have_ languages with much better datastructure support, so
that workaround is no longer necessary.

Post by frebe
The concept with loading data into advanced structures instead of
making select calls, was originally caused by performance reason.
Currently I work with a scheduling application there I have to use this
concept. The result is bloated and messy code using TreeMap, HashMap,
ArrayList, etc, and I every minute I wish I could made a select
statement instead. But the time overhead with the interprocess
communications is simply too high. (One solution would indeed be to use
hsqldb as an all-in-RAM, in-process DB). But I strongly argue for just
using this this concept when performance reason force you to do. Using
select statements will give you much less bloated code.

If you package up your data structures appropriately and offer suitable
operations on them, you can end up with a system that becomes similarly
easy to use as a database, but still offers you all the performance
benefits of using your bespoke data structures.

Also keep in mind that the SQL necessary to operate on a complex database
can become quite, um, _interesting_, such that even sequences of map
lookup, array indexing and pointer traversal can look quite simple and
straightforward in comparison. Of course, this very much depends on the
level of familiarity of the developer with both the application's language
and environment on the one hand, and with SQL and the relation model on
the other.

Yes, a RDBMS is something that offers a hugely flexible system for storing
data, and a unified interface for accessing and otherwise working with
those data. But as you mentioned yourself, performance considerations do
come into play as well. It may well be that even an in-memory RDBMS might
be too slow for your application.

And there still remains the issue of business logic. Using an OO system,
you can keep your data, their interreationships etc, the primitive data
operations, _and_ their business logic all together. I know that much can
be done in RDBMS:es these days by writing stored procedures, adding
triggers etc, which allow you to essentially put business logic into the
database engine, though I am soemwhat wary to use such an approach as that
can lock you into a specific database vendor's extension language. Of
course, looking at hsqldb, stored procedures etc would be written in Java,
just as the rest of the program, and executed potentially within the same
virtual machine ... Intersting things to think about.

By the way, thank you for offering useful and civilised discussion and
debate :)

Post by frebe
Fredrik Bertilsson
http://butler.sourceforge.net

// Christian Brunschen

frebe

2006-01-25 11:10:13 UTC

Transactions, as offered by RDBMS:es, are limited to the data stored witin the RDBMS,
so that if you want to use its transaction capability, you need to store the
appropriate data in the RDBMS.

Many RDBMS vendors supports distributed transactions (like XA). Other
resources, such as messages may also be part of the same transaction.
The component that controls the transaction is indeed outside the
RDBMS, but the RDBMS is able to participate in transaction, in opposite
to a file system.

but if you wanted to use one or maybe two of them _without_ wanting to use its
data storage model

Which other data storage model do you have in mind? XML files? Flat
files? In most enterprise scenarios these kind of low-level storage
models is simply not enough.

both queries, transactions and referential integrity
are just as applicable to persistent as to non-persistent data,

Exactly my point. You need queries even if you don't need persistence.
A RDBMS may be useful even if you don't have any persistence needs.

So, if all you want is transactions, an RDBMS probably shouldn't be the first
place to go.

So, where should I go?

What precisely is your definition of an 'enterprise application'?

I don't have a clear definition. I use the word to make people
understand that I am not talking about MP3 players, FTP clients etc. I
am mainly talking about applications for accounting, logistics
management, production control, etc.

but from your statement above it sounds like you would characterise enterprise
applications as using databases _not_ for incoming our outgoing data, but
_mainly_ for transient data used only in the process of whatever they are
doing?

An average enterprise application need persistence. But they also need
a lot of features provided by a RDBMS that is not related to
persistence (such as quieries).

So, databases were used to overcome the deficiencies in COBOL's support
for data structures?

No, the creators of COBOL did not make any advanced collection features
in the language simply because it was not necessary. A high-level
language was supposed to not handle data in a low-level way. Collection
handling was supposed to be done in a high-level way (SQL).

Post by frebe
If you look at enterprise applications outside the OO world, you will
find that they heavily use embedded SQL.

Please, give me some more specific pointers.

Do you doubt that pre-OO applications make heavy use of embedded SQL?
Look at the Oracle products Pro*C or Pro*COBOL for example. The
corresponing product for java, SQLJ, has gain very little attention
because the OO world rejects the use of embedded SQL.

If you package up your data structures appropriately and offer suitable
operations on them, you can end up with a system that becomes similarly
easy to use as a database,

Lets say i want to find every customer order from a customer located in
a given city I use this select statement:
select *
from order
join customer on order.customerid=customer.id
where customer.city=?

How would your code look like?

It may well be that even an in-memory RDBMS might
be too slow for your application.

But not very likely for enterprise applications. The most of time
overhead with using a RDBMS is in the inter-process and network
communication. Using stored procedures gives you a huge performance
gain.

Of course, looking at hsqldb, stored procedures etc would be written in Java,
just as the rest of the program, and executed potentially within the same
virtual machine ... Intersting things to think about.

Done it already. Love it. But other RDBMS have support of java stored
procedures too.

Fredrik Bertilsson
http://butler.sourceforge.net

H. S. Lahman

2006-01-23 17:35:03 UTC

Responding to Jacobs...

Post by topmind
Why do you keep saying "persistence"? I don't think you get the idea of
RDBMS and query languages. Like I said, think of a RDBMS as an
"attribute management system". Forget about disk drives for now. Saying
it is only about "persistence" is simply misleading.

The things you are talking about are pure static manipulations of data
within constraints defined by intrinsic, static data relationships.
That's what RDBs do. That sort of thing /should/ be in the DBMS because
it is part of generic (problem-independent) data storage access.

The issue in this subthread is putting business logic for specific,
dynamic problem solutions in the DBMS -- specifically stored procedures
that the DBMS executes automatically when the data store is modified.

Post by H. S. Lahman
More important to the context here, that implementation is
quite specific to one single paradigm for stored data.

Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.

4GLs get around it because they are independent of /all/ computing space
implementations.

I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.

It's not the execution model. It is what you have to specify when you
write the Java code and it is the intrinsic structural decisions (e.g.,
scoping rules based on 3GL block structuring).

Post by H. S. Lahman
However, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL.

Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.

huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.

You're deflecting again. Your assertion was that SQL was equivalent to
Java. My response was that Java is general purpose, which it clearly is
because I can solve any problem solvable on a computer with it, while
SQL is not, which it isn't because certain classes of computer problems
simply can't be solved with it.

Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.

This is just an example of deflecting through deliberate obtuseness.
You are trying to recast the issue in terms of relative "good". SQL
simply cannot access individual lines in a flat text file. Flat text
files are a valid persistence mechanism. QED.

Where to you get the ParagraphID and SentenceID? How do you maintain it
when paragraphs are inserted and deleted? You can parse the file and
assign your own number identities based on sequence in the file and you
can manipulate those for additions and insertions. However, that
requires changing the identity keys for paragraphs and sentences that
are, themselves, unchanged. IOW, you are violating the RDM notion of
unique /entity/ identity.

Then how do you keep track of those changes between application
executions? Since the identities are not stored in the file, that
becomes a problem. You could create your own file to keep track, but
then you are modifying the data store to solve the problem by making the
flat text file look like an RDB.

If there is only one application that cares, it can read the file and
renumber everything at startup. But that doesn't work in multi-user
contexts where things have to be coordinated (e.g., two applications
need to share information about individual paragraphs and sentences).
IOW, it is a one-shot solution, not a generic data model like the
RDB/SQL paradigm.

However, these problems just reflect the more general issue that there
is no way for your software user to tell you to extract a particular
paragraph or sentence without providing the entire text of the paragraph
or sentence. That is, there is no way for the user or anyone outside
your program to know your ParagraphIDs and SentenceIDs when there are
insertions and deletions.

How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.

That's not the point. In your example there is no semantic shift; you
are just changing the syntax.

It is not an analogy. I am describing different /concrete/ persistence
mechanisms. If I need information from a Windows .ini file, I don't
have a choice about the persistence mechanism. But I don't want my
application problem solution to have to be concerned about that
constraint; I want the problem solution to be indifferent to what
persistence mechanism is used. That separation of concerns ensures
better maintainability.

Post by H. S. Lahman
IOW, the semantics of the interface to the subsystem is /designed/ at a
different level of abstraction than that of the subsystem
implementation.

Bull. UML is often a LOWER level of abstraction because it can take
much more code/language to specify the same thing.

Another example of an utterly absurd assertion whose sole purpose is to
drive an opponent who knows better crazy. I know you better so I'm not
going to bite by dignifying such nonsense with a response.

However, it is annoying enough that you would still _try it on me_ when
you know I understand your game that I'll sign off here. Ta-ta.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

frebe

2006-01-24 07:09:04 UTC

Post by topmind
Bull. UML is often a LOWER level of abstraction because it can take
much more code/language to specify the same thing.

Why is this so absurd? The guys making a RDMS might use UML for
designing the software, before coding.

Fredrik Bertilsson
http://butler.sourceforge.net

topmind

2006-01-23 06:15:40 UTC

(Part 2 of reply)

Post by H. S. Lahman
Compute a logarithm. You can't hedge by dismissing "scientific"
computations.

I didn't. Nothing is ideal for everything under the sun. Nothing. See
above about general-purpose tools.

Post by H. S. Lahman
Try doing forecasting in an inventory control system w/o
"scientific" computations.

I am not sure what you are implying here. I did not claim that
scientific computation was not necessary.

No, in this case one example is not sufficient. "Always" is not a claim
I made. Thus, one exception, or even a few, does not pop my point
unless they are very common.

Post by H. S. Lahman
That's usually easy to do because examples are deliberately kept
simple to focus on the point in hand. That allows you to bring in
unstated requirements, programming practices designed for other
contexts, and whatnot to attack the example on grounds unrelated to the
original point. In this case, though, you screwed up by setting up a
basis for the deflection ahead of time.
You asked for an example outside of "timing".

Actually, it depends on the type of scientific computation. If the
computation involves a lot of information and a lot of cross
referencing, then a DB may be the appropriate tool. (I gave a list of
common exception patterns in a reply to P. May.)

Post by H. S. Lahman
The main reason SQL isn't
a general purpose 3GL is that it can't handle dynamics (algorithmic
processing) very well. So the obvious examples are going to tend to be
algorithmic, such as computing a logarithm. But your parenthetical
hedge set up a basis for dismissing any obvious example as "scientific"
when you subsequently deflect. Then later you can argue the point was
never demonstrated.

I am not sure what you mean. I agree that "algorithmic" tends to be
areas where DB's will often not be helpful. There are some places where
hammers are useless also. However, I still consider them a general
purpose tool.

The Traveling Salesman problem can be arbitrarily large and the RDB
model will still probably not be useful because...
<aside>
FYI, most of the Operations Research algorithms are actually pretty
simple when written out in equations and the core processing doesn't
require a lot of code. Typically most of the code is involved in
getting the data into the application, setting up data structures, and
reporting the results. In addition, the interesting problems are huge
and involve vast amounts of data.
For example, the logistics for the '44 D-Day invasion of Normandy held
the record as the largest linear programming problem ever solved well
into the '70s. The equations for the Simplex solution were written in a
few lines but the pile of data processed was humongous and the actual
execution took months. (It had to be split up into many chunks because
of the MTTF of the computer hardware and a lot of preprocessing was done
by acres of clerks with hand-cranked calculators.)
</aside>

That is a very specific mathematic process such that a dedicated
custom-built data-structure plus algorithm will out-perform a DB. In a
sense, a linear programming system is *also* perhaps a general-purpose
tool because LP problems can be found in diverse industries, but so far
less common than DB's.

Post by topmind
(It may be possible to use a DB to solve Salesmen quickly, but few
bother to research that area.)

Just because it hasn't been done does not mean it is not possible. Most
algorithm researchers ignore DB's out of historical habit.

Well, UML *is* language. It is a visual language just like LabView is.

Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.

So can any Turing Complete language.

And your point is...?

UML is no different than any other language in terms of cross-emulation
and has no extra powers.

Post by H. S. Lahman
On separation of concerns of problem solving dynamics vs. data

You know very well what I mean by 'separation of concerns' in a software
context, so don't waste our time recasting it. Modularity has been a
Good Practice since the late '50s.

If there is only one concern set where each concern is mutually
exclusive, then we have no disagreement. In practice there are usually
multiple "partioning" candidates, and that is where the disagreements
usually arise. File and text systems don't make it easy to have
partitioning in all dimensions, so compromises must be made. It is "my
factor is more important than your factor, neener neener". If there is
only one way to slice the pizza, then there is no problem. But if there
are multiple ways, then a fight breaks out.
This is one reason why DB's are useful: the more info you put into the
DB instead of code, the more ad-hoc, situational partitionings you can
view. You are not forced to pick the One Right Taxonomy of
partitioning. Categorizational philosphers came to the consensus that
there is no One Right Taxonomy for most real-world things.

Even within "subject matter" there are probably multiple orthogonal
partitioning candidates. So that one alone carries my point.

Post by H. S. Lahman
Subject matter: Clearly static data storage and providing generic access
to it is a different subject matter than solving Problem X.

That might be true if DB's were *only* about persistence. But, they are
not.

Post by H. S. Lahman
Level of abstraction: Outside CRUD/USER processing the detailed
manipulation of data storage (e.g. ,two-phased commit) is clearly at a
much lower level of abstraction than the algorithmic processing the
solves a particular problem. IOW, the application solution is
completely indifferent to where and how data is stored. One should be
able to solve the problem the same way regardless of what the
persistence mechanisms are.

"Same way" may hard-wire the solution to a particular paradigm. How is
hard-wiring the solution to a relational paradigm worse than
hard-wiring it to your pet paradigm? You are in the same ball.
Paradigm-neutral algorithms have not been invented yet (and UML ain't
it).

Post by H. S. Lahman
That substitutability means that the
problem solution is at a higher level of abstraction than the
persistence mechanisms.

IIIFFFF we were only dealing with "persistence". But, we are not.

Post by H. S. Lahman
Requirements Allocation: Clearly the requirements for persistence
implementation and access are quite different than the requirements on
the specific solution of Problem X.

You are obsessed with "persistence". Many DB features can be used even
without the expectation of persistence. (See "desktop DB days" above.)

Post by H. S. Lahman
So under all three of these criteria it makes sense to separate the
concerns of persistence from individual problem solutions. That's
exactly what DBMSes do. The problems only come into play when one
violates that separation of concerns and starts bleeding specific
problem solutions into the DBMS itself.

The boundary is not really hard in practice. It *could* be, I agree,
but generally makes for a bloated, beurocratic system design. UML and
OO interface tend to be that way these days.

Post by H. S. Lahman
The RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.

I think what you view as context-dependent is not really context
dependent after all. It is just your pet way of viewing the world
because of all the OOP anti-DB hype.

Tables are a general-purpose data structure. They can represent
anything that stacks, maps, trees, sets, etc can. Plus, they flex
better. For example, if you use a dedicated map, you cannot add a
second value cell (3 total cells) without a lot of code rework. With
tables it is a snap.

I agree that a dedicated map may be faster in some cases, but usually
only on the small such that performance is not an issue anyhow in the
small. Dedicated structure kits rarely come with disk/ram-caching
management and concurrency management, for example. Thus, if your needs
grow up, you are hosed.

Dedicated *anything* is usually at least slightly faster than a
general-purpose tool used for that purpose. DB's just make it so you
don't have to roll-your-own 95% of the time.

Post by H. S. Lahman
[Note that this is relevant to the point above about providing SQL
drivers for different storage paradigms. That makes sense for CRUD/USER
environments because one is already employing SQL as the norm. So long
as the exceptions requiring a special driver are fairly rare, one can
justify the single access paradigm. However, it makes no sense at all
for non-CRUD/USER environments because one has to reformat the data to
the problem solution anyway. So rather than reformatting twice, one
should just reformat once from a driver that optimizes for the storage
paradigm.]

Please clarify. An example may help.

This is another non sequitur deflection. Caching and whatnot is not
relevant to the point I was making. There is business a trade-off
between run-time performance and developer development time that every
shop must make. Sometimes greater developer productivity can justify
reusing the RDB paradigm when more efficient specific solutions are
available.
However, my point was that those situations tend to map to CRUD/USER
processing. Once problems become more complex than format conversions
in UI/DB pipeline applications, performance becomes the dominant
consideration. I spent years solving large np-Complete problems on
machines like PDP11s and there was no contest on that issue; customers
simply would not spring for Crays in their systems but they would spring
for a marginal extra developer cost prorated across all systems.

See above on "linear programming". A dedicated linear programming tool
will indeed outperform a DB. A hammer specially designed for a certain
kind of nail will outperform an off-the-shelf hammer. No news there.

Post by H. S. Lahman
For a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.

Reformatting? Please clarify.

This is called a "result set" or "view". Most queries customize the
data to a particular task. Thus, it *is* a solution view.

Yes it can. In fact, Oracle has vendor-specific idioms in its SQL just
for trees. However, trees are often the wrong data structure for
non-trival things anyhow. They are popular because they are
conceptually simple, but in reality they bog down from a management
standpoint because the real world is better classificed with sets, not
trees 90+ percent of the time. Limits, and non-tree additions to IBM's
IMS already demonstrated that. We don't need to relearn the lessons of
the 60's.

Post by H. S. Lahman
When the only tool you have is a Hammer, then everything looks like a
Nail.

No, out of necessity I started my career without DB usage, and I never
want to return there.

Again, I never said that DB's are good for every problem. I don't know
enough about that particular scenario to propose a DB-centric solution
and to know whether it is an exception or not.
Unless you provide some specific use-case or detailed sceneria, it is
anecdote against anecdote here.
RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
gigantic.

Of course they are. They provide a generic, context-independent access
to stored data that any application can use. That's why they exist.
But that is beside the point.
The issue here is where individual business problems should get solved.
My assertion is that is an application responsibility. For CRUD/USER
processing one can use the same data structures in the solution as in
the RDB so P/R as a software development paradigm works well.
Generally, though, one can't use the same data structures once one is
out of the CRUD/USER realm so P/R doesn't work very well.

Again, it is a dissagreement out the size and range of where RDBMS
don't do well. The special cases are fairly rare, but you claim them
common. Most software out there is business applications. It is the
biggest domain, perhaps even more than 50% of all software written.

Post by topmind
[...]
Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)

Why? That has nothing to do with whether a DBMS should execute dynamic
business rules and policies. This isn't an OO vs. P/R discussion, much
as you would like to make it so.

Are you saying it is a UML-versus-RDB debate?

There is another part of this topic devoted to measuring complexity. No
need to reinvent that debate here. If you claim biz apps are inharently
"simpler", you will need to provide some kind of objective metric, or
at least something besides a personal feeling. You seem like a
well-educated person; I expect more from you than feelings and
argument-from-authority.

Post by H. S. Lahman
H. S. Lahman

(Note that I had to snip some context because the usenet system was
complaining about size and I am too lazy to split this yet again.)

-T-

Patrick May

2006-01-19 21:25:38 UTC

[ . . . ]

Post by topmind
method getGreenScarvesCostingLessThan100dollars(...) {
sql = "select * from products where prod='scarves' and color='green'
and price < 100"
return(foo.execute(sql))
}

Please provide a cite to any commercial or open source tool that
creates such code from UML models.

Sincerely,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc. | The experts in large scale distributed OO
| systems design and implementation.
***@spe.com | (C++, Java, Common Lisp, Jini, CORBA, UML)

Daniel Parker

2006-01-17 04:08:02 UTC

Post by H. S. Lahman
Responding to Jacobs...

Of course it's an implementation! It implements access to physical
storage.

That literally doesn't make sense. It's like saying that a Java interface
is an implementation because it implements access to the properties of a
physical instantiation.

Post by H. S. Lahman
BTW, remember that I am a translationist. When I do a UML model, I don't
care what language the transformation engine targets in the model
implementation. (In fact, transformation engines for R-T/E typically
target straight C from the OOA models for performance reasons.) Thus
every 3GL (or Assembly) represents a viable alternative implementation of
the notion of '3GL'.

I think it's fair to say that SQL has, for all its faults, been enormously
successful, to the tune of a multi-multi-billion dollar industry, and that
the UML translationist approach has not. It's been over ten years since the
translationist industry has claimed to have solved the problem of 100
percent translation, but where is it, it's niche, it's nowhere. Other
technologies have arrived, e.g. the W3C XML stack and particularly XSLT
transformation, that dwarf executable UML in application. Why do you think
that is? What do you think it is about software development that makes
executable UML marginal, and other technologies like SQL important?

Regards,
Daniel Parker

H. S. Lahman

2006-01-17 17:00:57 UTC

Responding to Parker...

Of course it's an implementation! It implements access to physical
storage.

That literally doesn't make sense. It's like saying that a Java interface
is an implementation because it implements access to the properties of a
physical instantiation.

You have to step up a level in abstraction. Imagine you are a code
generator and think of it in terms of invariants and problem space
abstraction.

The invariant is that all physical storage needs to be accessed in some
manner. There are lots of ways to store data and lots of ways to
access. Therefore ISAM, SQL, CODSYL, and C's gets all represent
specific implementations of access to physical storage that resolve the
invariant.

Similarly, for interfaces the invariant is that every popular 3GL
provides a type system that allows access to properties. But each 3GL's
type system provides different syntax. Therefore every popular 3GL is a
specific implementation of a type system interface.

Post by H. S. Lahman
BTW, remember that I am a translationist. When I do a UML model, I don't
care what language the transformation engine targets in the model
implementation. (In fact, transformation engines for R-T/E typically
target straight C from the OOA models for performance reasons.) Thus
every 3GL (or Assembly) represents a viable alternative implementation of
the notion of '3GL'.

All the world loves a straight man. B-)

There are actually several reasons. The technology for full translation
has actually been available since the early '80s. However, the
technology had not matured enough for good optimization for _general
computing_ until the late '90s. (Note that it took a decade for C
optimization to get remotely close to the level of optimization that
FORTRAN had in '74 and the translation optimization problem is much more
complex.)

[FYI, translation has already been widely accepted in niches for many
moons. The ATLAS test requirements specification language is
universally used in milaero and it is translated directly into
executables. That's a billion dollar business. 4GLs like HPVEE and
LabWindows have also been widely used for electronic system analysis and
design. However, the big translation demo lies in CRUD/USER processing.
Any time one develops an application using a RAD IDE like Access or
Delphi one is essentially using translation. That's a multi-billion
dollar niche that has been around since the '80s.]

Probably the second most important reason for translation to be just
exiting Early Adopter Stage is that using translation requires a major
sea change in the way one develops software. It is not just a matter of
pushing a button and having 3GL or Assembly code pop out the other end.
Translation affects almost every aspect of the development process
from the way one approaches problems to the way one tests. From a
Management perspective, major sea changes in the way things are done
spells RISK and that makes selling translation tough.

I think the third reason is developer resistance and NIH. Most software
developers today have literally grown up writing 3GL code. Trying to
persuade them that, of all the things a software developer does, writing
3GL code is the least important is a tough sell. (Going into a sales
presentation is like stepping back in time to the early '60s trying to
sell COBOL or FORTRAN to a bunch of BAL programmers; the arguments are
the same with only the buzzwords changing.)

A fourth reason is the lack of standardization. Until OMG's MDA effort
all translation tools were monolithic; modeling, code generation,
simulation, and testing were all done in the same tool with proprietary
repositories, AALs, and supporting tools. (Prior to UML, they each had
unique modeling notations as well.) That effectively marries the shop
to a specific vendor. In '95 Pathfinder was the first company to
provide plug & play tools that would work with other vendor's drawing
tools. MDA has changed that in the '00s so now plug & play is a reality.

A fifth reason is price. Translation tools are not cheap because they
require very fancy optimization, graphics for model animation, built-in
test harnesses, and a bunch of other support stuff. (Not to mention
cutting edge automation design.) Vendors want to recover that cost so
one has an Occam's Razor: the Early Adopter market is too small to allow
recovery through shrink-wrap pricing but the market will only grow
slowly if one uses recovery pricing.

However, I think all this is kind of beside the point. Technologies
like SQL, W3C XML stacks, and XLST transformation are just computing
space /implementations/ that a transformation engine can target. Part
of the transformation engine's optimization problem is picking the right
computing space technology for the problem in hand from those available.
IOW, apropos of the opening point, the application developer using
translation is working at a much higher level of abstraction so
SQL/XML/XLST concerns belong to a different union -- the transformation
engine developer.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

Daniel Parker

2006-01-17 19:01:09 UTC

Post by H. S. Lahman
Responding to Parker...

Post by H. S. Lahman
Of course it's an implementation! It implements access to physical
storage.

That literally doesn't make sense. It's like saying that a Java interface
is an implementation because it implements access to the properties of a
physical instantiation.

It's not SQL that's the implementation, even in this context; it's the
driver that implements the connectivity to the data provider. Any data
provider that is capable of supplying data in conformance with the SQL
data model will do. A typical driver supports connectivity to a
relational database, but it's common these days for middleware vendors
to provide drivers that will adapt web service output to SQL, according
to certain rules. You can get drivers that will adapt flat file
content to SQL, XML to SQL, and dynamic content from a C++ application
server to SQL. Why do vendors provide such things? Because SQL is a
standard querying language that is widely supported by a whole host of
tools, because it provides a standard interface to data that conforms
to a particular data model. The physical origin of that data is
irrelevent. It's not just a question of conveniencing users with a
syntax that they already know; it's a question of supporting automatic
binding and reusing exisiting tooling.

In your code generator example, you could easily have multiple drivers
supporting SQL that resolve against very different physical data
sources. (Not to mention XQuery drivers that resolve against the same
sources.)

Post by Daniel Parker
I think it's fair to say that SQL has, for all its faults, been enormously
successful, to the tune of a multi-multi-billion dollar industry, and that
the UML translationist approach has not. It's been over ten years since the
translationist industry has claimed to have solved the problem of 100
percent translation, but where is it, it's niche, it's nowhere. Other
technologies have arrived, e.g. the W3C XML stack and particularly XSLT
transformation, that dwarf executable UML in application. Why do you think
that is? What do you think it is about software development that makes
executable UML marginal, and other technologies like SQL important?

All the world loves a straight man. B-)

:-)

Post by H. S. Lahman
However, the big translation demo lies in CRUD/USER processing.
Any time one develops an application using a RAD IDE like Access or
Delphi one is essentially using translation. That's a multi-billion
dollar niche that has been around since the '80s.]

Right, but Access doesn't do this with executable UML. I don't think
Access would benefit from going in this direction.

Post by H. S. Lahman
A fourth reason is the lack of standardization. Until OMG's MDA effort
all translation tools were monolithic; modeling, code generation,
simulation, and testing were all done in the same tool with proprietary
repositories, AALs, and supporting tools. (Prior to UML, they each had
unique modeling notations as well.) That effectively marries the shop
to a specific vendor. In '95 Pathfinder was the first company to
provide plug & play tools that would work with other vendor's drawing
tools. MDA has changed that in the '00s so now plug & play is a reality.

I grant that we are moving towards a service based world, with
graphical interfaces playing a role in the assembly and orchestration
of services, but I see no evidence that the glue is going to be
executable UML. I don't think that many people care about what the OMG
is doing anymore. Events have overtaken them. I don't think MDA is
very important. There are other evolving standards for plug and play.

Regards,
Daniel Parker

H. S. Lahman

2006-01-19 21:50:50 UTC

Responding to Parker...

Post by H. S. Lahman
Of course it's an implementation! It implements access to physical
storage.

That literally doesn't make sense. It's like saying that a Java interface
is an implementation because it implements access to the properties of a
physical instantiation.

The operative phrase is "in conformance with". SQL reflects a very
narrowly defined model for data representation...

Post by Daniel Parker
relational database, but it's common these days for middleware vendors
to provide drivers that will adapt web service output to SQL, according
to certain rules. You can get drivers that will adapt flat file
content to SQL, XML to SQL, and dynamic content from a C++ application
server to SQL. Why do vendors provide such things? Because SQL is a
standard querying language that is widely supported by a whole host of
tools, because it provides a standard interface to data that conforms
to a particular data model. The physical origin of that data is

Quite so. Standardization is a Good Thing. So using SQL _when the data
model conforms_ can be a good thing. However, outside CRUD/USER
processing the problem solution data model often does not conform. If
the persistence data model does conform, then one needs a conversion
between the views and I suggest that the conversion to SQL should be
encapsulated in an application subsystem.

I also suggest that when the persistence data model does not conform
(e.g., flat files), then converting to SQL as intermediary when the
solution data model doesn't conform either is just a waste of time. One
would be better off just converting directly between the views once
rather than performing two conversions. [As far as reuse is concerned,
file managers provided that sort of reuse in the '70s for flat files,
and one can provide quite generic interfaces tailored to an XML model.
But don't get me going on DOMs, which strikes me as akin to providing a
SQL interface for flat files.]

Post by Daniel Parker
irrelevent. It's not just a question of conveniencing users with a
syntax that they already know; it's a question of supporting automatic
binding and reusing exisiting tooling.
In your code generator example, you could easily have multiple drivers
supporting SQL that resolve against very different physical data
sources. (Not to mention XQuery drivers that resolve against the same
sources.)

But why? Surely when one does not have to maintain the code it makes no
difference how one performs the conversion, so SQL standardization is
irrelevant since the application developer will never see SQL. Since
one has to provide a driver for each paradigm anyway in the
transformation engine, why not provide one that efficiently represents
the paradigm directly?

All the world loves a straight man. B-)

:-)

Right, but Access doesn't do this with executable UML. I don't think
Access would benefit from going in this direction.

It wouldn't. UML is an OOA/D notation and I think OO is overkill in the
CRUD/USER realm precisely because translation automation is already
provided for most of the stuff OO would deal with in that niche via IDEs
like Access. It was just an example of translation at work in a major
industry segment; I was just addressing your assertion that translation
isn't being used. Access is a translation tool; it is just limited to
CRUD/USER processing while UML is a bona fide general purpose 4GL.

Plug & play is an issue for supporting tools. I see monolithic
I-am-the-development-environment tools as being a bar to automation in
general and translation in particular, which was the issue here. MDA
has been very helpful in bringing in the necessary conceptual
standardization.

Meanwhile eUML is an OO development design methodology. An eUML OOA
model for translation should be indistinguishable from an OOA model for
traditional elaboration. [They usually are distinguishable because
elaboration OOA models a typically done in a very sloppy manner because
errors, missing processing, etc. can be "fixed later". You can't get
away with that sort of sloppiness in translation because the code
generator does what you say, not what you meant.]

Translation is just a merger of a rigorous OO design methodology with a
suite of automation tools for the computing space.

I think MDA is very important today. Entire highly-tailorable
development environments like Eclipse have been enabled by it. I agree
it doesn't directly matter a whole lot to application developers because
they still do OOA/D the same way. [Using eUML if they are serious about
doing OOA properly. B-)] But it has a great effect on the tools that
make their lives easier. Standardization breeds plug & play and that
fosters competition and economies of scale through specialization. In
the end developers will be much better off for MDA regardless of whether
they use translation or elaboration. (Note that most of the traditional
round-trip elaboration vendors are Major Players in the MDA initiative.)

Translation just takes that further so that the application developer
never has to worry about details like SQL, EJB, XML, TCP/IP, or a host
of other deterministically defined computing space technologies and
techniques. Those things are completely transparent to the
translationist. Not having to think about stuff like that when solving
the customer's problem improves productivity a great deal.

One final note. Major software houses have been buying traditional
translation vendors for the past 3-4 years to establish strategic
positions in translation. (Some like IBM/Rational/ObjectTime are
two-tiered purchases!) Project Technologies, Mellor's firm that started
it all, was just bought last year. IBM, MS, CA, Mentor et al are all
getting into translation. I believe Pathfinder and Kennedy-Carter are
the only two pure translation venders that are still independent. So
the Deep Pockets Guys seem to think there is a future there. B-)

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH

s***@yahoo.com

2006-01-18 09:12:03 UTC