Responding to Jacobs...
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanPost by topmindSQL is not an implementation. What is the difference between locking
yourself to SQL instead of locking yourself to Java? If you want
open-source, then go with PostgreSQL. What is the diff? Java ain't no
universal language either.
Of course it's an implementation! It implements access to physical
storage.
Just as Java implements access to physical RAM etc.
Exactly. Java is a specific implementation of a 3GL. 3GL is the
abstraction, Java is an implementation. Persistence access is the
abstraction, SQL is an implementation.
Why do you keep saying "persistence"? I don't think you get the idea of
RDBMS and query languages. Like I said, think of a RDBMS as an
"attribute management system". Forget about disk drives for now. Saying
it is only about "persistence" is simply misleading.
Persistent data is data that is stored externally between executions of
an application. RDBs are a response to that need combined with a
requirement that access be generic (i.e., the data can be accessed by
many different applications, each with unique usage contexts). That's
what DBMSes do -- they manage persistent data storage and provide
generic, context-independent access to that data storage.
My point in this subthread is that such responsibilities are complicated
enough in practice that one does not want the DBMS to also manage and
execute dynamic business rules and policies. IOW, the DBMS should just
mind its own store. [This thread has been a veritable hotbed of puns.
I've probably made more in this thread than I've done in the last
decade. B-)]
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanMore important to the context here, that implementation is
quite specific to one single paradigm for stored data.
Any language or API is pretty much going to target a specific paradigm
or two. I don't see any magic way around this, at least not that you
offer. UML is no different.
4GLs get around it because they are independent of /all/ computing space
implementations.
I am not sure UML qualifies as 4th Gen. Just because it can be
translated into multiple languages does not mean anything beyond Turing
Equivalency. C can be translated into Java and visa verse.
A UML OOA model can be implemented unambiguously and without change in a
manual system. In fact, that is a test reviewers use to detect
implementation pollution. The OOA model for, say, a catalogue-driven
order entry system will look exactly the same whether it is implemented
as a 19th century mail-in Sears catalogue or a modern broswer-based web
application. That is not true for any 3GL.
Post by topmindPost by H. S. LahmanHowever, that's not the point. SQL is a 3GL but comparing it to Java is
specious because Java is a general purpose 3GL.
Again, this gets into the definiton of "general purpose". I agree that
query languages are not meant to do the *entire* application, but that
does not mean it is not general purpose. File systems are "general
purpose", but that does not mean that one writes an entire application
in *only* a file system. It is a general purpose *tool*, NOT intended
to be the whole enchilata.
Huh?!? If you can't write the entire application in it, then it isn't
general purpose by definition.
Post by topmindA hammer is a general purpose tool, but that does not mean one is
supposed to ONLY use a hammer. You need to clarify your working
definition of "general purpose", and then show it the consensus
definition for 4GL.
huh**2?!? A hammer is not a general purpose tool by any stretch of the
imagination.
Post by topmindPost by H. S. LahmanSQL represents a
solution to persistence access that is designed around a particular
model of persistence itself. So one can't even use it for general
purpose access to persistence, much less general computing.
Please clarify. Something can still be within a paradigm and be general
purpose. Further GP does not necessarily mean "all purpose", for
nothing is practially all purpose.
SQL is designed around the RDB paradigm for persistence. It can't be
used for, say, accessing lines in a text flat file because the text file
is not does organize the data the way SQL expects. So SQL is not a
general purpose interface to stored data. Apropos of your point,
though, SQL is quite general purpose for accessing /any/ data in a
uniform way from a data store _organized like an RDB_.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanRequirements -> 4GL -> 3GL -> Assembly -> machine code executable
Everything on the left is a specification for what is immediately to its
right. Similarly, everything to the right is a solution implementation
for the specification on its immediate left.
Well that is a bit outdated. For one, the distinction between 4GL and
3GL is fuzzy, and many compilers/interpreters don't use assembler.
My 4GL definition isn't ambiguous, which is why I like it. Reviewers of
OOA models have no difficulty recognizing implementation pollution.
Argument by authority.
I prefer to think of it as argument by rational practicality.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanGo look at an SA/D Data Flow Diagram or a UML Activity Diagram. They
express data store access at a high level of abstraction that is
independent of the actual storage mechanism. SQL, ISAM, CODASYL, gets,
or any other access mechanism, is then an implementation of that generic
access specification.
SQL is independent of the "actual storage mechanism". It is an
interface. You may not like the interface, but that is another matter.
Repeat after me: "SQL is an interface, SQL is an interface, SQL is an
interface"....
Try using SQL vs. flat files if you think it is independent of the
actual storage mechanism. (Actually, you probably could if the flat
files happened to be normalized to the RDM, but the SQL engine would be
a doozy and would have to be tailored locally to the files.) SQL
implements the RDB view of persistence and only the RDB view.
How is that different than ANY other interface? You are claiming magic
powers of UML that it simply does not have.
There is a distinction between describing an interface and designing its
semantics. UML is quite capable of describing the semantics of any
interface. Deciding what the semantics should be is quite another thing
that the developer owns.
When I have a subsystem in my application to access persistent data,
that subsystem has an interface that the rest of the application talks
to. That interface is designed around the rest of the application's
data needs, not the persistence mechanisms. It is the job of the
persistence access subsystem to convert the problem solution's data
needs into the access mechanisms de jour.
If the persistence is an RDB, then the subsystem implementation will
<probably> use SQL. If the persistence is flat text files, it will use
the OS file manager and streaming facilities. If it is clay tablets, it
will use an OCR and stylus device driver API. That allows me to plug &
play the persistence mechanisms without touching the application
solution because it still talks to the same interface regardless of the
implementation of the subsystem.
IOW, the semantics of the interface to the subsystem is /designed/ at a
different level of abstraction than that of the subsystem
implementation. UML doesn't care about the design process; it just
represents the results.
Post by topmindAnd as somebody pointed out, one can use SQL on flat files too. ODBC
drivers can be created to hook SQL to spreadsheets, flat files, etc.
Only if the data is organized around embedded identity and normalized.
Even then such drivers carry substantial overhead and tend to be highly
tailored to specific applications. IOW, you need a different driver for
every context (e.g., a spreadsheet) and then it won't be as efficient as
an access paradigm designed specifically for the storage paradigm.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanJava is certainly a general purpose 3GL. Like most 3GLs there are
situations where there are better choices (e.g., lack of BCD arithmetic
support makes it a poor choice for a General Ledger), but one could
still use it in those situations. SQL, in contrast, is a niche language
that just doesn't work for many situations outside its niche.
You could be right, but I have yet to see a good case outside of
split-second timing issues where there is a limit to the max allowed
response time. (This does not mean that rdbms are "slow", just less
predictable WRT response time.)
If you can give an example outside of timing, please do. (I don't doubt
they exist, but I bet they are rarer than you imply. Some scientic
applications that use imaginary numbers and lots of calculus may also
fall outside.)
Compute a logarithm. You can't hedge by dismissing "scientific"
computations.
I didn't. Nothing is ideal for everything under the sun. Nothing. See
above about general-purpose tools.
Post by H. S. LahmanTry doing forecasting in an inventory control system w/o
"scientific" computations.
I am not sure what you are implying here. I did not claim that
scientific computation was not necessary.
I was just anticipating your deflection; you've been using the
give-me-an-example ploy for years. B-) When the example is provided
you deflect by attacking it on grounds unrelated to the original point.
That's usually easy to do because examples are deliberately kept
simple to focus on the point in hand. That allows you to bring in
unstated requirements, programming practices designed for other
contexts, and whatnot to attack the example on grounds unrelated to the
original point. In this case, though, you screwed up by setting up a
basis for the deflection ahead of time.
You asked for an example outside of "timing". The main reason SQL isn't
a general purpose 3GL is that it can't handle dynamics (algorithmic
processing) very well. So the obvious examples are going to tend to be
algorithmic, such as computing a logarithm. But your parenthetical
hedge set up a basis for dismissing any obvious example as "scientific"
when you subsequently deflect. Then later you can argue the point was
never demonstrated.
Post by topmindPost by H. S. LahmanOr try encoding the pattern recognition that
the user of a CRUD/USER application applies to the presented data. The
reality is that IT is now solving a bunch of problems that are
computationally intensive.
As usual, "it depends". Problems where there is a lot of "chomping" on
a small set of data are probably not something DB's are good at (at
this time). An example might be the Travaling Salesman puzzle. However,
problems where the input is large and from multiple entities are more
up the DB's alley.
The Traveling Salesman problem can be arbitrarily large and the RDB
model will still probably not be useful because...
<aside>
FYI, most of the Operations Research algorithms are actually pretty
simple when written out in equations and the core processing doesn't
require a lot of code. Typically most of the code is involved in
getting the data into the application, setting up data structures, and
reporting the results. In addition, the interesting problems are huge
and involve vast amounts of data.
For example, the logistics for the '44 D-Day invasion of Normandy held
the record as the largest linear programming problem ever solved well
into the '70s. The equations for the Simplex solution were written in a
few lines but the pile of data processed was humongous and the actual
execution took months. (It had to be split up into many chunks because
of the MTTF of the computer hardware and a lot of preprocessing was done
by acres of clerks with hand-cranked calculators.)
</aside>
Post by topmind(It may be possible to use a DB to solve Salesmen quickly, but few
bother to research that area.)
Unlikely. It's an np-Complete problem so the worst case always involves
an exhaustive search of all possible combinations (i.e., O(N*N)). The
exotic algorithms just provide /average/ performance that approaches
O(NlogN). But those algorithms require data structures that are highly
tailored to the solution. And because of the crunching one wants
identity in the form of array indices, not embedded in tables or the
problem doesn't get solved in a lifetime.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanBTW, remember that I am a translationist. When I do a UML model, I
don't care what language the transformation engine targets in the model
implementation. (In fact, transformation engines for R-T/E typically
target straight C from the OOA models for performance reasons.) Thus
every 3GL (or Assembly) represents a viable alternative implementation
of the notion of '3GL'.
Well, UML *is* language. It is a visual language just like LabView is.
Exactly. But solutions at the OOA level are 4GLs because they can be
unambiguously implemented without change on any platform with any
computing technologies.
So can any Turing Complete language.
And your point is...?
On separation of concerns of problem solving dynamics vs. data
Post by topmindPost by H. S. LahmanPost by topmind"Separation" is generally irrelavent in cyber-land. It is a phsycial
concept, not a logical one. Perhaps you mean "isolatable", which can be
made to be dynamic, based on needs. "Isolatable" means that there is
enough info to produce a seperated *view* if and when needed. This is
the nice thing about DB's: you don't have to have One-and-only-one
separation/taxonomy up front. OO tends to want one-taxonomy-fits-all
and tries to find the One True Taxonomy, which is the fast train the
Messland. Use the virtual power of computers to compute as-need
groupings based on metadata.
You know very well what I mean by 'separation of concerns' in a software
context, so don't waste our time recasting it. Modularity has been a
Good Practice since the late '50s.
If there is only one concern set where each concern is mutually
exclusive, then we have no disagreement. In practice there are usually
multiple "partioning" candidates, and that is where the disagreements
usually arise. File and text systems don't make it easy to have
partitioning in all dimensions, so compromises must be made. It is "my
factor is more important than your factor, neener neener". If there is
only one way to slice the pizza, then there is no problem. But if there
are multiple ways, then a fight breaks out.
This is one reason why DB's are useful: the more info you put into the
DB instead of code, the more ad-hoc, situational partitionings you can
view. You are not forced to pick the One Right Taxonomy of
partitioning. Categorizational philosphers came to the consensus that
there is no One Right Taxonomy for most real-world things.
There are three accepted criteria for application partitioning (i.e.,
separating concerns at the scale of subsystems): Subject matter, level
of abstraction, and requirements allocation via client/service
relationships. (BTW, this has nothing to do with OO; it is basic
Systems Engineering.)
Subject matter: Clearly static data storage and providing generic access
to it is a different subject matter than solving Problem X.
Level of abstraction: Outside CRUD/USER processing the detailed
manipulation of data storage (e.g. ,two-phased commit) is clearly at a
much lower level of abstraction than the algorithmic processing the
solves a particular problem. IOW, the application solution is
completely indifferent to where and how data is stored. One should be
able to solve the problem the same way regardless of what the
persistence mechanisms are. That substitutability means that the
problem solution is at a higher level of abstraction than the
persistence mechanisms.
Requirements Allocation: Clearly the requirements for persistence
implementation and access are quite different than the requirements on
the specific solution of Problem X.
So under all three of these criteria it makes sense to separate the
concerns of persistence from individual problem solutions. That's
exactly what DBMSes do. The problems only come into play when one
violates that separation of concerns and starts bleeding specific
problem solutions into the DBMS itself.
Post by topmindPost by H. S. LahmanThe RDB paradigm
is not designed for context-dependent problem solving; it is designed
for generic static data storage and access in a context-independent manner.
I think what you view as context-dependent is not really context
dependent after all. It is just your pet way of viewing the world
because of all the OOP anti-DB hype.
My view of context-dependence is the solution to a /particular/ problem.
Each application solves a unique problem. IOW, the problem is the
context. RDBs provide persistence that allows all the applications to
access the data in a uniform way regardless of what specific problem
they are solving.
Whether one can solve the problem in a reasonable fashion with the data
structures mapped to the RDB structure depends on the nature of the
problem. For CRUD/USER processing one can. For problems outside that
realm one can't so one needs to convert data into structures tailored to
the problem in hand.
[Note that this is relevant to the point above about providing SQL
drivers for different storage paradigms. That makes sense for CRUD/USER
environments because one is already employing SQL as the norm. So long
as the exceptions requiring a special driver are fairly rare, one can
justify the single access paradigm. However, it makes no sense at all
for non-CRUD/USER environments because one has to reformat the data to
the problem solution anyway. So rather than reformatting twice, one
should just reformat once from a driver that optimizes for the storage
paradigm.]
Post by topmindPost by H. S. LahmanBefore you argue that the RAM RDB saves developer effort because it is
largely reusable and that may be worth more than performance, I agree.
But IME for /large/ non-CRUD/USER problems the computer is usually too
small and performance is critical.
Please clarify. Ideally the RDBMS would determine what goes into RAM
and what to disk such that the app developer doesn't have to give a
rat's rear. Cache management generally does this, but a both-way system
is probably not as fast as a dedicated RAM DB. Even if the two-way
ideal is not fully reached, one will soon have the *option* to switch
some or all of an app to a full-RAM DB as needed without rewriting the
app. The query language abstracts/encapsulates/hides that detail way.
This is another non sequitur deflection. Caching and whatnot is not
relevant to the point I was making. There is business a trade-off
between run-time performance and developer development time that every
shop must make. Sometimes greater developer productivity can justify
reusing the RDB paradigm when more efficient specific solutions are
available.
However, my point was that those situations tend to map to CRUD/USER
processing. Once problems become more complex than format conversions
in UI/DB pipeline applications, performance becomes the dominant
consideration. I spent years solving large np-Complete problems on
machines like PDP11s and there was no contest on that issue; customers
simply would not spring for Crays in their systems but they would spring
for a marginal extra developer cost prorated across all systems.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanFor a non-CRUD/USER application, SQL and the DBMS provide the first
relationship while a persistence access subsystem provides the
reformatting for the second relationship.
Reformatting? Please clarify.
The solution needs a different view of the data that is tailored to the
problem in hand. So the RDB view needs to be converted to the solution
view (and vice versa). IOW, one needs to reformat the RDB data
structures to the solution data structures.
This is called a "result set" or "view". Most queries customize the
data to a particular task. Thus, it *is* a solution view.
That formatting is cosmetic. The most sophisticated reformatting is
combing data from multiple tables in a join into a single table dataset.
I am talking about data structures whose semantics are different,
whose access paradigms are different, whose relationships are different,
and/or whose structure is different. IOW, there isn't a 1:1 mapping to
the RDB. For example, if my solution requires the data to be organized
hierarchically SQL queries can't do that.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanWhen the only tool you have is a Hammer, then everything looks like a
Nail.
No, out of necessity I started my career without DB usage, and I never
want to return there.
That's because you are in a CRUD/USER environment where P/R works quite
well. Try a problem like allocating a fixed marketing budget to various
national, state, and local media outlets in an optimal fashion for a
Fortunate 500.
Again, I never said that DB's are good for every problem. I don't know
enough about that particular scenario to propose a DB-centric solution
and to know whether it is an exception or not.
Unless you provide some specific use-case or detailed sceneria, it is
anecdote against anecdote here.
RDBMS are a common tool. The sales of Oracle, DB2, and Sybase are
gigantic.
Of course they are. They provide a generic, context-independent access
to stored data that any application can use. That's why they exist.
But that is beside the point.
The issue here is where individual business problems should get solved.
My assertion is that is an application responsibility. For CRUD/USER
processing one can use the same data structures in the solution as in
the RDB so P/R as a software development paradigm works well.
Generally, though, one can't use the same data structures once one is
out of the CRUD/USER realm so P/R doesn't work very well.
Post by topmindPost by H. S. LahmanPost by topmindPost by H. S. LahmanPost by topmindPost by H. S. Lahman<moved>What I implied was that CRUD/USER applications tend to be not very
complex. Report generation was never very taxing even back in the COBOL
days, long before SQL, RDBs, or even the RDM. Substituting a GUI or
browser UI for a printed report doesn't change the fundamental nature of
the processing.
Please clarify. If a process was "not taxing", then you are simply
given more duties and projects to work on. Management loads you up
based on your productivity and work-load.
Back in the '60s and early '70s writing COBOL code to extract data and
format reports was a task given to the entry level programmers. That's
where the USER acronym (Update, Sort, Extract, Report) came from. The
stars went on to coding Payroll and Inventory Control where one had to
encode business rules and policies to solve specific problems.
Fine, show how OO better solves business rule management. (Many if not
most biz rules can be encoded as data, BTW, if you know how.)
Why? That has nothing to do with whether a DBMS should execute dynamic
business rules and policies. This isn't an OO vs. P/R discussion, much
as you would like to make it so.
Are you saying it is a UML-versus-RDB debate?
Another deflection. How do you get from how complex report generation
software is to UML vs. RDB? The topic here has nothing to do with OO,
P/R, or UML. It is about the complexity of processing for CRUD/USER
applications vs. other applications.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
***@pathfindermda.com
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH