Discussion:
ANSWERS to "What's wrong with XQuery" question
Daniela Florescu
2010-07-24 20:59:00 UTC
Permalink
Hello everybody,

That was a very interesting discussion, thank you all.

I was trying to learn from the feedback we've got for my question
"What's wrong with XQuery?".

I tried to compile the answers into something that I could understand,
and I thought that
it may be interesting to share it.

I did put together a list of citations from this thread, together with
the author's email address.

Hope that helps, have a good weekend,
Dana


========================================================================
========================================================================


A. Language issues
=================

A.1) General limitations of the language
-------------------------------------------------------

**** If your input xml is in no namespace, and your output xml changes
the default namespace (such as a bog standard xml -> xhtml transform)
then you hit that massive issue of the xpath default namespace change.

andrew.j.welch-***@public.gmane.org

**** In a "process it if its there" scenario which is very common in
xml
transforms, you have to constantly check if the input is there before
creating the output structure:

andrew.j.welch-***@public.gmane.org

**** In XQuery, I have to jump through hoops with
closures to return two sequences from a function

int19h-***@public.gmane.org

**** I can't describe types in XQuery
itself, and XML Schema is such a pain to deal with, with multiple
limitations and quirks, not to mention a completely different syntax.

int19h-***@public.gmane.org

**** dynamic typing on XQuery is even more lax then usual

int19h-***@public.gmane.org

**** the lack of type polymorphism is also limiting for a
language with explicit type declarations.

int19h-***@public.gmane.org


**** lacking virtual function dispatch based on the argument types

sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org

A.2) Missing useful features from XSLT
------------------------------------------------------

**** It seems that the apply-template and template/@match, so far,
cannot
be done against an XML DB.

rob-/p3dT1ntlUHQT0dZR+***@public.gmane.org

**** you could have something analogous to apply-templates (at least
for element types), even if not the full pattern-matching.

sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org

**** Even something like a simple "match" statement, analogous to
switch but matching against patterns, would significantly improve
XQuery transformations

int19h-***@public.gmane.org

****The recursive typeswitch is no substitute for XSLT's recursive
descent...

andrew.j.welch-***@public.gmane.org

A.3) Not complete as a programming language
-----------------------------------------------------------------

**** XQuery is turing complete, but other than that, it is not a
complete
programming language

mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org

**** in 1.0 (the stable spec so far) we have a pure, declarative
language which doesn't
have functions as first-class values. So it's neither truly
functional, and not at all imperative.

int19h-***@public.gmane.org

**** For a general-purpose language, that's weak - how am I supposed
to write concise code, or reuse it, to
the same extent I'm used to in other modern languages?

int19h-***@public.gmane.org


**** XQuery isn't a general purpose language - it doesn't
provide a complete programming environment, for example through the
lack of data types.

mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org

**** one very quickly runs into needing to use vendor specific
extensions/hacks

jdmitchell-***@public.gmane.org

B. Architectural Issue.
==================

B.1) Does not satisfy well the semi-structured data /programmatic
data camp.
----------------------------------------------------------------------------------------------------------

**** the need for good languages which support the growth of semi-
structured and what I've been calling ad hoc structured data models

jdmitchell-***@public.gmane.org

**** there is data exchanged for programmatic purposes, where XML
sucks and JSON wins.

mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org

**** XML good (though overcomplicated) for trees, tolerable though
bloated for
linear data, and horrible for free-form graphs.

int19h-***@public.gmane.org

**** XML is great for markup
and semi-structured content, but it's really no good data model to
program against generally. Part of this comes from the difference
between node-labeled trees and edge-labled trees, but mostly it is
that XML exposes a lot of underlying complexity necessary for markup
use cases, and even if you try to abstract that, it will leak.

mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org

B.2. Does not integrate well yet with client software
---------------------------------------------------------------------

**** we need to make XML work better inside of existing Web 2.0 AJAX
frameworks by adding a MVC abstraction

awspyker-***@public.gmane.org

**** With JavaScript getting faster, I reiterate my idea that a
JavaScript library be built.

brettz9-/***@public.gmane.org

B.3. XML and JSON
-------------------------------

**** how to create a XML to JSON reversible lossless translation that
results is "good" markup on both sides

dlee-***@public.gmane.org

**** the future is going to see XQuery as a specialty
technology used in content management applications, and JSON databases
taking market share from relational databases.

mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org

**** how are you going to tweak the xquery language to be json focused
(or at least parity) w.r.t. xml?

jdmitchell-***@public.gmane.org

**** JSON is "good enough" to
generalize XML. And one could reasonably argue that it makes more
sense that way, because there are noticeably fewer concepts in JSON
than there are in XDM (or Infoset).

int19h-***@public.gmane.org

C. Performance
=============

**** I don't think the optimizers are good enough to handle it
automagically on their own in
all cases, either.

int19h-***@public.gmane.org

**** Performance behavior should be predictable over different
implementation

int19h-***@public.gmane.org

**** you are the optimizer, ie fiddle with until you get the best
performance.

sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org

**** does xquery really lend itself to that sort of optimization?

sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org


D. Portability
===========

**** You spend the rest of the day massaging it so that it actually
runs with reasonable
performance. And then you switch the backend and find out that your
query plan is totally different.

int19h-***@public.gmane.org

**** some way to explicitly
request eager and - especially! - lazy behavior would be extremely
handy for portable code.


int19h-***@public.gmane.org

**** portability is
important if you want third-party libraries and frameworks to appear
(and not immediately fragment between implementations).

int19h-***@public.gmane.org

E. Existing XQuery-related Software
============================

**** there are still those of us devs who have our own personal sites
and want a toy to play with, that will remain ours and don't need to
pay anything extra for it, especially if we're mostly just learning it
to use it for hobby coding or small existing sites, train ourselves,
etc. I think it would be great to have both options available...

brettz9-/***@public.gmane.org

**** People want stuff that's superfast and scalable but don't want to
have to pay for it.

jdmitchell-***@public.gmane.org

**** the XQuery implementation(s) aren't part of any of the standard,
el-cheapo shared hosting bundles

jdmitchell-***@public.gmane.org

**** so many vendors held various aspects of the language, libraries,
tooling hostage for their own benefit/purposes

jdmitchell-***@public.gmane.org

**** There's the vendor lock-in problems.

jdmitchell-***@public.gmane.org

F. Tools
=======

**** Development tools for other languages are cheap or even free, yet
powerful
and full-featured.

int19h-***@public.gmane.org

G. Libraries
=========

**** Libraries and frameworks? you get an implementation, typically
on top of
an XML database, providing various proprietary frameworks of its own
tied to that implementation. So if you want to write a full-fledged
app, you get vendor lock-in all the way down.

int19h-***@public.gmane.org

*** the development of good libraries/frameworks is hampered
by language limitations

int19h-***@public.gmane.org

H. Marketing, standards, positioning and perception
=========================================

**** XQuery is just *not* an interesting topic. It had it's hype
heyday and it didn't capitalize on that and so its mindshare (and
hypeshare) are low.

jdmitchell-***@public.gmane.org

**** Even if we were to stipulate that technically it is a general
purpose programming language, the perception of basically everybody is
that it's not.

jdmitchell-***@public.gmane.org

**** Nobody submitted a session on xquery.

jdmitchell-***@public.gmane.org

**** XQuery suffers from it being tied so directly with all of the
hype of "XML".

jdmitchell-***@public.gmane.org

**** XQuery suffers from its history in the standardization by
committee hell.

jdmitchell-***@public.gmane.org

**** it took forever for there to be a standard

jdmitchell-***@public.gmane.org

**** The "XML is evil" problem.

jdmitchell-***@public.gmane.org

**** Even with all of the latest updates to xquery, it's 2010 and
frankly it's still a mess.

jdmitchell-***@public.gmane.org

*** The only advantage of XQuery that I see is its concise syntax.
Everything else is mediocre or sub-par.

int19h-***@public.gmane.org

**** XQuery with its strange data model, its mix of syntactic styles,
its odd combination of weak and strong type checking, and its confused
positioning between a database query language, an XML transformation
language, and a general-purpose web programming language.

mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org
Pavel Minaev
2010-07-24 21:42:05 UTC
Permalink
Thank you, Dana.

A slight amendment:

> **** Performance behavior should be predictable over
> different implementation

To be more precise, the lower bound should be predictable over
different implementations - "must be at least this good, but if you
can do it better, great.

It's an important difference, because enforcing the same performance
would inhibit creative optimizations, and that's the last thing I
want. In that respect, the degree of freedom that XQuery spec gives to
the optimizer is a point in favor of the language. It just needs to
mandate a particular reasonable minimal complexity for certain
operations.

Maybe the best way to do it would be to look at the existing
implementations, and see if there are any points on which they all
agree. E.g. for something like this:

(for $i in 1 to 1000000 return $i * $i)[5]

If it turns out that all existing implementations out there guarantee
that at most 5 elements in (1 to 1000000) will actually be evaluated
when computing this - which I strongly suspect to be the case - then
perhaps it should be the minimum mandated by the spec. If some
implementations can optimize this further, and simplify this down to
5*5, they remain conformant, of course.

Similarly, if all existing implementations guarantee tail recursion
optimization, and the sest of call positions they consider to be
tail-recursive have a non-empty intersection, then that intersection
defines the "obvious tail recursive positions", which can also be
mandated by the spec.

This reminds me the story of std::string in C++. In C++98, it was not
required to be contiguous in memory, supposedly giving implementations
freedom to be creative. This was a hindrance to programmers, however,
who often needed that guarantee of continuity for convenient and
effecient code. 10 years after, when the C++ working group did a
survey of existing implementations, they didn't find a single one with
non-contiguous std::string, and also found a lot of code relying on
strings being contiguous despite it not being guaranteed by the spec -
and so they went ahead and added it as a requirement to the C++
Standard.
Michael Kay
2010-07-24 22:22:27 UTC
Permalink
> To be more precise, the lower bound should be predictable over
> different implementations - "must be at least this good, but if you
> can do it better, great.
>
> It's an important difference, because enforcing the same performance
> would inhibit creative optimizations, and that's the last thing I
> want. In that respect, the degree of freedom that XQuery spec gives to
> the optimizer is a point in favor of the language. It just needs to
> mandate a particular reasonable minimal complexity for certain
> operations.
>
> Maybe the best way to do it would be to look at the existing
> implementations, and see if there are any points on which they all
> agree. E.g. for something like this:
>
> (for $i in 1 to 1000000 return $i * $i)[5]
>
> If it turns out that all existing implementations out there guarantee
> that at most 5 elements in (1 to 1000000) will actually be evaluated
> when computing this - which I strongly suspect to be the case - then
> perhaps it should be the minimum mandated by the spec. If some
> implementations can optimize this further, and simplify this down to
> 5*5, they remain conformant, of course.
>
>

I invite anyone to attempt this, but I think it will be extremely
difficult. Anyone who's seen how much trouble we've had with the "errors
and optimization" section in the spec would be wary of writing a
"performance and optimization" section that actually makes testable
assertions that implementations are capable of conforming to. (And if
it's not a testable assertion, then it has no place in the language spec.)

Statements like "will actually be evaluated" turn out to be rather
difficult to test, and they aren't always meaningful in the context of a
particular implementation. What do you do, for example, if the 1000000
is actually $N, and [5] is actually [$M], and $N is known statically,
and the implementation decides to evaluate

(for $i in 1 to $N return $i * $i)

at compile time into an array? You then get constant performance
(independent of $M or $N) at run-time but a higher cost at compile time
and more memory used by the compiled code. Is the spec really going to
say whether or not that's an acceptable strategy for implementors to adopt?

I agree predictability of performance is a very significant problem -
not just across products, but within a single product. Part of the
answer, I think, is to make performance less reliant on good
optimization. In XSLT, the key() function goes a long way towards this:
by giving programmers a tool to control when indexes are built and used,
performance of many join constructs becomes much more predictable even
though the spec doesn't actually mandate how key() is implemented
(no-one would buy a product that implemented it as a serial search).
I've always felt that the anathema felt in the database query community
towards such constructs is misplaced - alhough it's great when
optimizers are good enough that they aren't needed, I've seen
programmers tearing their hair out trying to second-guess the optimizer,
and in such cases it's not clear we're doing programmers a service.

Michael Kay
Martin Probst
2010-07-25 13:51:57 UTC
Permalink
> Part of the answer, I think, is to make performance less reliant on good
> optimization. In XSLT, the key() function goes a long way towards this:
> by giving programmers a tool to control when indexes are built and used,
> performance of many join constructs becomes much more predictable
> [...]
> I've always felt that the anathema felt
> in the database query community towards such constructs is misplaced -
> alhough it's great when optimizers are good enough that they aren't needed,
> I've seen programmers tearing their hair out trying to second-guess the
> optimizer, and in such cases it's not clear we're doing programmers a
> service.

This is indeed somewhat funny. A huge part of the value proposition of
databases is that you write your application logic independent of
storage and lookup considerations. After an application is developed
to be correct, someone knowledgeable is supposed to tune the database
a bit, build some indexes, and everything is fine.

At the same time, the reality often seems to be that many teams
struggle with database performance a lot, because the assumption that
you can do optimizations independently and after application
development doesn't hold all that often. Code that does something in
an O(n) or even O(n**2) fashion tends to be actually incorrect in many
settings, not just a bit slower.

I've seen people actually write tests for their database queries to
make sure they get a reasonable execution plan, but that is extremely
hard to do (assertions on a query plan).

It might be nice to have language constructs saying "guarantee to me
that you do this in O(something), otherwise fail".

Martin
Daniela Florescu
2010-07-25 17:16:44 UTC
Permalink
>
>
> It might be nice to have language constructs saying "guarantee to me
> that you do this in O(something), otherwise fail".

Yes, that would be very useful. Some sort of performance guards.

Best regards
Dana
Michael Sokolov
2010-07-25 18:05:12 UTC
Permalink
On 7/25/2010 9:51 AM, Martin Probst wrote:
>> Part of the answer, I think, is to make performance less reliant on good
>> optimization. In XSLT, the key() function goes a long way towards this:
>> by giving programmers a tool to control when indexes are built and used,
>> performance of many join constructs becomes much more predictable
>> [...]
>>
>>
> It might be nice to have language constructs saying "guarantee to me
> that you do this in O(something), otherwise fail".
>

That's exactly right - for applications with sufficient scale, it's just
not enough to know that a given expression will be evaluated correctly:
it's also critical to understand whether indexed lookup will be applied
so as to guarantee completion (or failure) before the universe ends. I
heartily advocate tools (such as xslt's key()) that enable programmers
to communicate this sort of requirement to the query evaluator.

This already does exist (in vendor-specific extensions) in various
xquery database implementations that provide query extension functions
which allow explicit invocation of index lookups, for both full-text
queries, and also in some cases for typed value-based (range) queries.
As an aside, a related feature that is critical is the ability for the
query-writer to profile queries at a fine grain and/or have some
visibility into the execution plan.

I'm new to this discussion, so I'm sure that I'm unaware of a lot of
what has gone on, so please forgive me if I'm asking for something that
already exists! Over the past few years I've been devoted to
implementing large systems using xquery and have struggled at times with
getting predictable performance, especially across multiple platforms,
so I offer the wish list of an implementor.

It would be good idea to have some agreement on a standard for how to
query using indexes, and possibly how to create them as well. It does
seem to me that xslt key() is pretty close to getting it right: I might
prefer to add a few features though :)

1) the ability to query the values of the index keys and their
statistics (for faceting, distinct-values, max and the like); ideally
could be restricted by an orthogonal sequence of some sort (ie - given
a sequence of nodes, how many have each key value?)
2) Specifying type and collation information for indexes (and queries)
to enable range queries

I understand there is xquery full-text which may address some (or all ?)
of my concerns. I hope you will forgive me for not being completely au
courant regarding that: as an implementor, I'm focused on what's
available now and in the near future. But my quick read of that spec
turns up only a few references to indexing, none of which seem normative.

-Mike
Michael Rys
2010-07-26 05:05:52 UTC
Permalink
Hi all

While doing research into performance is good and most XQuery implementations including ours can benefit from better optimizations, I would strongly oppose "minimal performance requirements" in the standard. The standard defines the semantics and not how long it takes to get the result.

Now the one thing we should look at in the standard is whether we have expressions that are not easily optimizable and - if possible - not define them... like the in-scope namespace binding mess... or that /a/b/c requires rather expensive aggregation of all text nodes... Unfortunately, most of these "warts" were introduced in XPath 1.0 in order to make the language more usable... and not necessarily more optimizable...

Best regards
Michael

-----Original Message-----
From: talk-bounces-***@public.gmane.org [mailto:talk-bounces-***@public.gmane.org] On Behalf Of Pavel Minaev
Sent: Saturday, July 24, 2010 2:42 PM
To: Daniela Florescu
Cc: talk-***@public.gmane.org Talk
Subject: Re: [xquery-talk] ANSWERS to "What's wrong with XQuery" question

Thank you, Dana.

A slight amendment:

> **** Performance behavior should be predictable over different 
> implementation

To be more precise, the lower bound should be predictable over different implementations - "must be at least this good, but if you can do it better, great.

It's an important difference, because enforcing the same performance would inhibit creative optimizations, and that's the last thing I want. In that respect, the degree of freedom that XQuery spec gives to the optimizer is a point in favor of the language. It just needs to mandate a particular reasonable minimal complexity for certain operations.

Maybe the best way to do it would be to look at the existing implementations, and see if there are any points on which they all agree. E.g. for something like this:

(for $i in 1 to 1000000 return $i * $i)[5]

If it turns out that all existing implementations out there guarantee that at most 5 elements in (1 to 1000000) will actually be evaluated when computing this - which I strongly suspect to be the case - then perhaps it should be the minimum mandated by the spec. If some implementations can optimize this further, and simplify this down to 5*5, they remain conformant, of course.

Similarly, if all existing implementations guarantee tail recursion optimization, and the sest of call positions they consider to be tail-recursive have a non-empty intersection, then that intersection defines the "obvious tail recursive positions", which can also be mandated by the spec.

This reminds me the story of std::string in C++. In C++98, it was not required to be contiguous in memory, supposedly giving implementations freedom to be creative. This was a hindrance to programmers, however, who often needed that guarantee of continuity for convenient and effecient code. 10 years after, when the C++ working group did a survey of existing implementations, they didn't find a single one with non-contiguous std::string, and also found a lot of code relying on strings being contiguous despite it not being guaranteed by the spec - and so they went ahead and added it as a requirement to the C++ Standard.

_______________________________________________
talk-***@public.gmane.org
http://x-query.com/mailman/listinfo/talk
Pavel Minaev
2010-07-26 06:23:04 UTC
Permalink
On Sun, Jul 25, 2010 at 10:05 PM, Michael Rys <mrys-***@public.gmane.org> wrote:
> While doing research into performance is good and most XQuery implementations including ours can benefit from better optimizations, I would strongly oppose "minimal performance requirements" in the standard. The standard defines the semantics and not how long it takes to get the result.

It seems to be the consensus among implementers, so far as I can see,
so perhaps that proposal was too rushed. Let's follow up on the other
idea.

> Now the one thing we should look at in the standard is whether we have expressions that are not easily optimizable and - if possible - not define them... like the in-scope namespace binding mess... or that /a/b/c requires rather expensive aggregation of all text nodes... Unfortunately, most of these "warts" were introduced in XPath 1.0 in order to make the language more usable... and not necessarily more optimizable...

The other side of this is to add new expressions in the standard which
_are_ easily optimizable (and which today have to be represented with
a more elaborate construct that may not be so obviously optimizable).
Which gets us back to that discussion about something along the lines
of XSLT key() in XQuery - to give one idea, as surely there are plenty
more.
Michael Kay
2010-07-26 08:18:24 UTC
Permalink
> Now the one thing we should look at in the standard is whether we have expressions that are not easily optimizable and - if possible - not define them... like the in-scope namespace binding mess... or that /a/b/c requires rather expensive aggregation of all text nodes... Unfortunately, most of these "warts" were introduced in XPath 1.0 in order to make the language more usable... and not necessarily more optimizable...
>
>

One thing I learnt from James Clark while he was handing over the baton
on XSLT is that reasoning about optimizability while a language is under
development is fraught with danger. The best thing you can do to ensure
optimizability is to make the semantics clean and orthogonal. Features
added to languages in the interests of implementors often have exactly
the opposite effect (an example I cite is the rule in XML Schema that
says a restriction of an xs:all group must retain the order of the
elements from the original. The spec actually says that this restriction
is there for the convenience of implementors; but in order to enforce
this restriction, I have to distort my implementation to retain the
order of elements in an xs:all group, which would otherwise not be
necessary.) I have often made the mistake of opposing or proposing
things in the spec with implementation arguments in mind, only to find
that the implementation arguments evaporated when I thought harder about
it. (I remember a panic when I realized that Unicode codepoint collation
is not the same as byte-by-byte sorting of UTF16 strings. That made me
think about how I was comparing strings, and after thinking about it, I
found a way that was faster than Java String.equals(), as well as
conforming better to the spec. That's not an isolated experience.)

So I've learnt that it's usually best to design the language for the
convenience of users, not of implementors.

One of the problems we have in this space (and it continues, with things
like precisionDecimal) is that we're designing a query language over a
data model that we don't control, and which is largely designed with
objectives other than query in mind. That's how the namespace mess
arose. On the other hand, there are merits in the approach - I guess if
query language people had been designing XML, we wouldn't have had mixed
content, and we would thereby have eliminated a large chunk of the
justification for using XML in preference to other data models.

Michael Kay
Saxonica
Xavier Franc
2010-07-25 18:18:08 UTC
Permalink
> **** Performance behavior should be predictable over
> different implementations

I think this is a fairly strange requirement:
is it the problem of the language or
is it a problem with some implementations?

If some implementations offer poor performance,
then why not just drop them?
If you find for example a C++ compiler which
produces code 10 times slower than other compilers,
are you going to insist changing the specs of C++?
Won't you just choose another compiler?

> (for $i in 1 to 1000000 return $i * $i)[5]
>
> If it turns out that all existing implementations out there guarantee
> that at most 5 elements in (1 to 1000000) will actually be evaluated
> when computing this - which I strongly suspect to be the case -
This is definitely not the case.
I know at least 2 popular implementations that don't (no names given),
and I suspect there are more.

This is a detail, but it indeed indicates that there are
appreciable differences between implementations currently.

Rather than overspecifying a language which is already
bloated, perhaps it's users of XQuery implementations who should
make more efforts to select better products, if they care about
speed. Perhaps a good independant benchmark would help.
But is it going to happen?


--
Xavier Franc
Pavel Minaev
2010-07-25 19:25:09 UTC
Permalink
On Sun, Jul 25, 2010 at 11:18 AM, Xavier Franc <xavier_f-1g3y2VubmHnQT0dZR+***@public.gmane.org> wrote:
>> **** Performance behavior should be predictable over
>> different implementations
>
> I think this is a fairly strange requirement:
> is it the problem of the language or
> is it a problem with some implementations?
>
> If some implementations offer poor performance,
> then why not just drop them?
> If you find for example a C++ compiler which
>  produces code 10 times slower than other compilers,
>  are you going to insist changing the specs of C++?
> Won't you just choose another compiler?

Thing is, 10 times slower than O(1) is still O(1). It scales the same.
That is the more crucial part.

>>    (for $i in 1 to 1000000 return $i * $i)[5]
>>
>> If it turns out that all existing implementations out there guarantee
>> that at most 5 elements in (1 to 1000000) will actually be evaluated
>> when computing this - which I strongly suspect to be the case -
> This is definitely not the case.
> I know at least 2 popular implementations that don't (no names given),
> and I suspect there are more.
>
> This is a detail, but it indeed indicates that there are
> appreciable differences between implementations currently.
>
> Rather than overspecifying a language which is already
> bloated, perhaps it's users of XQuery implementations who should
> make more efforts to select better products, if they care about
> speed. Perhaps a good independant benchmark would help.
> But is it going to happen?

My take on this is not from user's perspective so much so as from
library developer's perspective. Let's say that I want to write some
code reusable between implementations. Can I refactor a single query
into several functions (say, for the sake of readability), applied in
a chain, such that the first invocation does some filtering, the
second one does some expensive per-item processing, and the last one
does more filtering? If I know that evaluation of sequences is
pervasively lazy, then the answer is "yes" - since the second function
will only process items that are demanded by the third function. If I
don't have such guarantee, then such code may have a very drastic
performance difference across various implementations. As a library
developer, quite obviously, I cannot afford that - I want the library
to be useful to as many people out there as possible - so I rewrite it
in a less readable way, with everything jammed in together, so that
any implementation can (or rather I hope it can, since I _still_ don't
have any guarantees!) do it right.

Alternatively, I can put additional prerequisites onto the authored
library, aside from a conformant XQuery 1.x implementation - for
example, I can further specify that implementation must have lazy
sequence evaluation for the library to actually be usable. But then
I'm doing precisely what I suggested doing in this thread, except that
now every author does it for himself, and their sets of requirements
above spec conformance need not even match...
Daniela Florescu
2010-07-26 05:46:28 UTC
Permalink
Xavier,


>
> If you find for example a C++ compiler which
> produces code 10 times slower than other compilers,
> are you going to insist changing the specs of C++?
> Won't you just choose another compiler?

Switching between databases ain't as easy as switching between
compilers.

>
>> (for $i in 1 to 1000000 return $i * $i)[5]
>>
>> If it turns out that all existing implementations out there guarantee
>> that at most 5 elements in (1 to 1000000) will actually be evaluated
>> when computing this - which I strongly suspect to be the case -
> This is definitely not the case.
> I know at least 2 popular implementations that don't (no names given),
> and I suspect there are more.

How much to know the names ? Just kidding :-)
(just crossing fingers is not one of mine's...)

> Perhaps a good independant benchmark would help.
> But is it going to happen?

Hmmm. It's hard.

Relational databases had at the beginning a clear, unique target: bank
transactions.

Everything was built around that, so it was relatively easy to build
a benchmark that was simulating that.

XQuery users appear to be all over the place:

1. Pub-Sub,
- hundreds of thousands of path expressions for routing
- small messages, but hundreds of thousands per XX of them

2. Message transformations for Web services
- queries usually larger in size then the data they apply to
- small messages, but hundreds of thousands per XX of them

3. Querying relational data with flexible fields
- large database
- queries relational in style, but no fixed schema

4. Data integration across multiple data sources
- large database, but distributed and non-homegeneous
- queries relational in style, but no fixed schema

5. Large content databases
- large mixed content databases
` - queries with lots of full text search

6. Web mashups
- data obtained from REST or WS
- just linked, glued and integrated with XQuery

7. End-to-end information processing apps (my favorite)

Etc. Etc. Etc.

Each one of those scenarios needs a different benchmark.

it will take a while, but I know of several pieces of work in this
direction.

In general, yes, a benchmark is a very useful tool to speed up a
technology.

When vendors start competing on numbers, it starts to be fun :-)

Best regards
Dana
James Fuller
2010-07-25 18:16:16 UTC
Permalink
thanks for the summary Daniela, I am picking up this thread late.

I dont have any conclusions, but wanted to add a few random thoughts.

* all xml based technologies got seriously effected by the latest war
over what HTML should be ... in some ways this war is worst then the
classic browser wars in the 90's as it is developer communities pitted
against each other rather then commercial entities staking out lockin.
Maybe commercial entities have gotten more savvy but the end result is
that significant portions of the developer community are opposed to
XML, mostly due to issues of perception rather then technical analysis
... I could mention DOM but thats another permathread.

* For many web developers, working with XSLT was always via some
'host' language which gave them the impression that XSLT was not
standalone. XSLT is a wonderful language and since 2000 I have done
all sorts of obscene things that you shouldn't do with it ...but lets
face it, its a magnitude easier to teach a developer XQuery and there
will always be those who comment on XSLT verbosity. There have been
some saying why not put all of XSLT goodness (template matching) into
XQuery ... I know the technical args for and against, but it would be
interesting but perhaps too bold a move to consider.

* the hegemony that is javascript in the browser is one of the real
problems ... we need to have more 'real' standardized choices on the
web clientside (XQuery, Perl, anything) and whilst there will be those
who argue that javascript is a great language, one could argue that a
language that needs a framework (e.g. jquery) to be useful is maybe
one that has some fundamental design issues, but I have nothing
against a language that looks like a funny version of c (which is a
fav of mine) but just that ecmascript is the only relatively standard
option across all browsers. We all thought XSLT might be that 'other'
language but for one reason or another it has failed to take off
(another reason for the web crowd to hate us for); I doubt we will see
XSLT v2.0 in any browser soon.

* nosql databases have made inroads where xml databases have not, and
this is mostly down to aspects of comparable scale and performance ...
I see standards such as websockets being very important and I don't
see why XML databases can't just expose AJAX push & JSON interfaces
over their datastores. Also, lets remember that nosql has some very
deep pockets when it comes to what companies are supporting the
development of these standards/implementations ... its great that a
lot of the technology is open source but we need to come up with new
tricks to compete.

* XML may seem complicated to 'todays' developers but thats because
they never spent hours with a hex editor debugging a binary
undocumented data format ... so pain is relative ... for many people
XML is about angle brackets, which is clearly just syntax ... we need
to make it very easy to go from XML->JSON and back ... if this means
we create a short hand version of XML that is easily consumed and
understood by the 'javascript generation', so be it but the longer we
look like the 'complicated data format' the longer we look like this
generation's 'binary'.

* xpath is the quiet superpower underneath XQuery/XSLT ... I always
wondered why things like
http://docs.jquery.com/DOM/Traversing/Selectors#XPath_Selectors are
not being used more or why XPATH itself has not found its way deep
into the web architecture/browser.

* XQuery is a funny language and I like it as (once we get HoF) its
essentially a functional programming language for those who work with
XML. It hits a certain sweet spot and in combination with an XML
database makes developing web data applications a breeze. Though as
others have mentioned, it is a bit of a mess, I do worry about the
emerging proliferation of libraries which could be standardized ...
but I don't think anything a W3C WG fixing these (some minor) issues
would have a big impact in adoption.

* If we look past the vagaries of tag soup, the web today can be
looked at as a very large distributed database of documents where
search engines have become the main query endpoint and REST the
mechanism for interacting ... there has to be a lot of opportunities
there where XML databases and XQuery have a bigger future then just a
niche publishing technology but we need to come up with some good
ideas sooner rather then later.

James Fuller
Daniela Florescu
2010-07-26 05:13:01 UTC
Permalink
Amen, James, to almost everything you said.

>
> significant portions of the developer community are opposed to
> XML, mostly due to issues of perception rather then technical analysis

That perception is what pushed me to start this thread in the first
place.

I am SO tired of it, you couldn't believe it.

It's 10 years old, or more, maybe it's time to wake up, take a deep
breath, and reconsider.

This "XML is evil" starts to grow old (and more and more stupid) on me,
and I am surprised that smart IT people like Tim O'Reilly and comp.
don't divorce themselves
from it (they should, but again, who am I to judge).


> There have been
> some saying why not put all of XSLT goodness (template matching) into
> XQuery ... I know the technical args for and against, but it would be
> interesting but perhaps too bold a move to consider.

I am all for it. i.e. adding XSLT pattern matching power to XQuery.


>
> * the hegemony that is javascript in the browser is one of the real
> problems ... I doubt we will see
> XSLT v2.0 in any browser soon.

I've seen from experience that a plugin is cute, but doesn't work.
Another solution would be
to build a full XQuery processor in Javascript, or compile XQuery into
Javascript code.

I have no other ideas here...but I am all ears.

However, maybe XQuery and XSLT will not make it on the browsers,
but one thing I am sure is that: Javascript won't make it on the
server side either.

So GREAT: we'll be in a standoff (client vs. server) for a while.

>
> * nosql databases have made inroads where xml databases have not, and
> this is mostly down to aspects of comparable scale and performance ...

Huh. No sure about that.

While there are fundamental principles that those guys got right (i.e.
no schema
and no ACID transactions in distributed systems), they missed high
level declarative programming,
and, my God, don't forget that one can prove that Cassandra in current
incarnation
looses updates. (and Cassandra is probably the best among them..)

[[ not that updates get DELAYED, which is OK, they are literally LOST]]

I mean: what do you think Oracle (or DB2, or SQLServer, or MySQL)
customers will think about that !???

> I see standards such as websockets being very important

Me too. I like them, and I wish them well to get mature. But they
aren't yet. They are XX years
away from that.

They remind me of the XML databases in 2002. (sorry, I am old).

Literally, they are like XML database 10 years ago: they just ADD
another layer to
a typical infrastructure, without simplifying anything. And who wants
THAT !? As if it
is not complicated ENOUGH !?

They are on the right path, but they aren't there yet.

But XQuery should learn from what they've done right: NO ACID
transactions by default,
and, especially, not in distributed systems.


> and I don't
> see why XML databases can't just expose AJAX push & JSON interfaces
> over their datastores.

Huh. All serious XMl databases produce and consume JSON, don't they !?

> Also, lets remember that nosql has some very
> deep pockets when it comes to what companies are supporting the
> development of these standards/implementations ...

I think you are getting things wrong :-)

There are WAY deeper pockets around XML then around NoSQL.
[[ just a thought: NIEM is an XML standard, BPEL is an XML standard,
and HL7 is an XML standard :-)]]

There is just more NOISE around NoSQL .

Just don't be fooled by the relative silence around XML.
(Only fools confuse silence with nothigness :-)


> its great that a
> lot of the technology is open source but we need to come up with new
> tricks to compete.
>
> * XML may seem complicated to 'todays' developers , we need
> to make it very easy to go from XML->JSON and back ... if this means
> we create a short hand version of XML that is easily consumed and
> understood by the 'javascript generation',

Yep. That's what we need. How quickly can we convince the W3C to
standardize that !?? :-)

>
>
> * xpath is the quiet superpower underneath XQuery/XSLT

Yep. Agree 100%.

>
> * XQuery is a funny language and I like it as (once we get HoF) its
> essentially a functional programming language for those who work with
> XML. It hits a certain sweet spot and in combination with an XML
> database makes developing web data applications a breeze.

Amen.

> Though as
> others have mentioned, it is a bit of a mess,

Yes, it is. No doubt. It's still a mess. But a funny little useful mess.

> I do worry about the
> emerging proliferation of libraries which could be standardized ...

I do worry about that too. The sooner we find a way to standardize
libraries, the better!


> there has to be a lot of opportunities
> there where XML databases and XQuery have a bigger future then just a
> niche publishing technology but we need to come up with some good
> ideas sooner rather then later.

OK. That's why I started this thread. What do you propose ?


Best regards
Dana
James Fuller
2010-07-26 06:49:28 UTC
Permalink
> James Fuller said:
> * nosql databases have made inroads where xml databases have not, and
> this is mostly down to aspects of comparable scale and performance ...
> Daniela Florescu said in response:
> Huh. No sure about that.
> While there are fundamental principles that those guys got right (i.e. no
> schema
> and no ACID transactions in distributed systems), they missed high level
> declarative programming,
> and, my God, don't forget that one can prove that Cassandra in current
> incarnation
> looses updates. (and Cassandra is probably the best among them..)
> [[ not that updates get DELAYED, which is OK, they are literally LOST]]
> I mean: what do you think Oracle (or DB2, or SQLServer, or MySQL)
> customers will think about that !???

yes, data loss is not ever an acceptable scenario in 2010!

lets not forget that there will be customers who will decide on
'should I pay a license' or is using open source acceptable.

I think nosql is reminding us that we can look at all that RDBMS are
and think about the future and reconsider assumptions.

> James Fuller said:
> Also, lets remember that nosql has some very
> deep pockets when it comes to what companies are supporting the
> development of these standards/implementations ...
> Daniela Florescu said in response:
> I think you are getting things wrong :-)
> There are WAY deeper pockets around XML then around NoSQL.
> [[ just a thought: NIEM is an XML standard, BPEL is an XML standard,
> and HL7 is an XML standard  :-)]]
> There is just more NOISE around NoSQL .
> Just don't be fooled by the  relative silence around XML.
> (Only fools confuse silence with nothigness :-)

agreed, I am talking about the developer 'mind share' (u mentioned in
previous posts) ... BPEL/NIEM/HL7/XBRL are enterprise technologies
which IMHO foster little grassroots support with developers (web or
otherwise) who have no need of ever using these technologies until
they are developing software within the enterprise. I would also note
that the level of transparency or information sharing between these
communities is mismatched. The nosql crowd is offering open source
technologies which impart 'planet scaling' using interesting (albeit
sometimes flawed) and accessible techniques that developers can try
out now.

For example,

http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/

shows thats a pretty high level of transparency of what Facebook is
using to get work done but I know that there is also a lot of big
enterprise workhorses supporting things at Facebook just as well.

Perhaps this is yet again showing that Open Source is superior
development approach ... e.g. maybe today their technologies dont work
perfectly but by open sourcing and involving other developers, high
transparency and evolution will probably take over and present more
capable software in the future as they 'iterate' development.

I think we need to be careful about catering for the needs of a few groups

* 'Browser/js generation': what are we offering to the web people of
today who want to learn technologies that make their lives easier and
cooler ?

* 'Planet wide': what are we offering to the cool web people today,
who are trying to solve hard problems.

* 'Enterprise': over the past 5 years this group is embracing web
standards, but there is a sense that a lot of complexity is artificial
(as shown by 'planet wide' group) as there can be simpler ways to
solve problems. That being said, enterprise do have hard problems
which 'oversimplifying' can exacerbate.

* Everyone else (mobile, etc)

> Daniela Florescu said in response:
> OK. That's why I started this thread. What do you propose ?

* XML database vendors need to get organized

* investigate short hand XML for the 'javascript generation' asap

* investigate ways of pushing xpath deep into the web stack, e.g. both
client/server side as replacement for DOM (need to investigate
'setting' e.g. update scenarios)

* encourage/sponsor EXPATH & EXQUERY, along with CXAN repository

I have a few more thoughts but need to draft them a bit more clearly
before I present here.

James Fuller
COUTHURES Alain
2010-07-26 07:12:40 UTC
Permalink
Hi James,
> * investigate short hand XML for the 'javascript generation' asap
>
Can you please be more specific about this point?

Thanks!

-Alain
Pavel Minaev
2010-07-26 07:34:09 UTC
Permalink
On Mon, Jul 26, 2010 at 12:12 AM, COUTHURES Alain
<alain.couthures-g9Gpw7ZaukQS+***@public.gmane.org> wrote:
>> * investigate short hand XML for the 'javascript generation' asap
>>
>
> Can you please be more specific about this point?

I think this implies some kind of shorthand syntax for Infoset (or
maybe even XDM) such that it can stand up to JSON and the likes in
brevity for non-mixed content.
Martin Probst
2010-07-26 08:14:12 UTC
Permalink
James Fuller <james.fuller.2007-***@public.gmane.org> wrote:
> we need to make it very easy to go from XML->JSON and back

The trouble with that is the edge-labelled vs. node-labelled tree
problem. In XML, labels (tag names) stick to the individual nodes, in
JSON, those names stick to the edges that connect nodes. The data
formats are incompatible on a relatively low level. You'll have to
find a way to express mixed content and repeating elements in JSON,
and likewise a way to express the JSON data structures (maps, lists)
in XML. And on top of that, even on the lowest level we have
incompatibilities, for example JSON allows key values that are not
NCNames.

*** on a side note, I think this is also why JSON has so much more
uptake for web programmers and other communities sending programmatic
data around. Edge-labelled trees are trivial to map into most
programming language's object models (they avoid the object<->XML
impedance mismatch), and I think they are also closer to how people
think about the world around them in general. ***

Effectively, you can either have a horrible JSON format that covers
most of XML (even if we decide to ignore attributes vs elements,
entities, and so on), and no one will want to use it. Or you can
decide to have a relatively cumbersome XML format that captures all
the semantics of JSON docs.

In end effect, both formats will be ugly and more or less annoying to
program against with the regular tools for the respective language.
Also, these conversions will be leaky: if someone decides to change
something in the XML representation of a JSON document, it might no
longer be possible to map that back into JSON. Ouch.

It's not really hard to define an acceptable mapping for a specific
given format, but a general mechanism will be ugly and hard to use. I
don't think defining a mechanism that will always be hard to use and
probably introduce many minor issues is necessarily a good idea.

Martin
COUTHURES Alain
2010-07-26 09:10:03 UTC
Permalink
James,
> we need to make it very easy to go from XML->JSON and back ...
There are missing constraints in XML format for good JSON generation and
this can be seen as type information.

In XML, there are strings everywhere while, in JSON, you can have dates,
numbers, booleans too.

Structures such as:

<root>
<a>1</a>
<a>2</a>
</root>

can be transformed into:
{ a: [1, 2]}

but:

<root>
<a>1</a>
</root>

would automatically be transformed into:

{ a: 1}

instead of:

{ a: [1]}

Indicating in the XML document that this has to be an array for JSON is
mandatory.

Naming conventions can help or xsi:type use will do the trick.

From JSON to XML, attributes vs. elements choice is to decide but,
again, naming conventions could help.

What do you think?

-Alain
James Fuller
2010-07-26 09:18:02 UTC
Permalink
Thx to Martin for the succinct overview of issues with XML<->JSON,

I guess I am more leaning towards coming up with 'something different'
... maybe look at what serializing/dumping e4x internal representation
is as I am assuming its a normal js object.

I think if we get a serialization that maps directly onto a js object
then we are done, e.g. we just need a representation that is
consumable by javascript ... and I think e4x maybe the hook here.

J
Martin Probst
2010-07-26 09:39:43 UTC
Permalink
> I guess I am more leaning towards coming up with 'something different'
> ... maybe look at what serializing/dumping e4x internal representation
> is as I am assuming its a normal js object.

As I said, the problem is the 'philosophy' of the data format. I think
you will have a really hard time coming up with something that unifies
both, again under the assumption it should still be useable.

> I think if we get a serialization that maps directly onto a js object
> then we are done, e.g. we just need a representation that is
> consumable by javascript ... and I think e4x maybe the hook here.

One can directly consume XML from an AJAX call in JavaScript, right?
So what would be the point of using E4X?

Martin
Brett Zamir
2010-07-26 10:13:49 UTC
Permalink
On 7/26/2010 5:39 PM, Martin Probst wrote:
>> I think if we get a serialization that maps directly onto a js object
>> then we are done, e.g. we just need a representation that is
>> consumable by javascript ... and I think e4x maybe the hook here.
> One can directly consume XML from an AJAX call in JavaScript, right?
> So what would be the point of using E4X?
I'll let James answer for his own case, but it is cumbersome not to have
the same means of expressing inline XML in JavaScript when you need it
without resorting to ugly and likely concatenated strings. I also don't
think that even the standard XPath available in browsers that support it
is as comfortable to use out of the box as E4X (nor does it do in-place
updating). It can be disruptive to development if you have to store all
your XML in separate files (though if you're only using it for data
storage, then it's probably ok to do it with Ajax->DOM alone).

Brett
Martin Probst
2010-07-26 11:38:36 UTC
Permalink
>> So what would be the point of using E4X?
>
> I'll let James answer for his own case, but it is cumbersome not to have the
> same means of expressing inline XML in JavaScript when you need it without
> resorting to ugly and likely concatenated strings.

Sorry, I wasn't precise. What I meant was, what would be the advantage
of having an E4X based serialization format that is transmitted over
the wire, as compared to the regular XML via AJAX method. I can easily
see how having E4X for development in the browser is a big advantage.

> It can be disruptive to development if you have to store all your XML in
> separate files (though if you're only using it for data storage, then it's
> probably ok to do it with Ajax->DOM alone).

I think for a backend service, you'd be generating your XML anyway,
wouldn't you?

Martin
James Fuller
2010-07-26 12:13:42 UTC
Permalink
On Mon, Jul 26, 2010 at 1:38 PM, Martin Probst <mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org> wrote:
>>> So what would be the point of using E4X?
>>
>> I'll let James answer for his own case, but it is cumbersome not to have the
>> same means of expressing inline XML in JavaScript when you need it without
>> resorting to ugly and likely concatenated strings.
>
> Sorry, I wasn't precise. What I meant was, what would be the advantage
> of having an E4X based serialization format that is transmitted over
> the wire, as compared to the regular XML via AJAX method. I can easily
> see how having E4X for development in the browser is a big advantage.

sorry, not being cryptic on purpose, distracted by work ...

What all this means is that XML seems to be losing a fight it
shouldn't e.g. it *should* be the the 'no brainer' format for
representing document orientated data but it seems that transport
level optimization and integration characteristics of JSON is winning
over web developers versus sheer power for documentation
representation/manipulation.

In the browser, developers are eschewing directly working with XML and
I thought promoting and enhancing E4X could be a way back into the
hearts of developers e.g. by showing how powerful XPATH is (with E4X
xpath extension over DOM) and making it easy to work with XML natively
in whatever language you use. This solves the integration issue.

In parallel, we need to solve the optimization issue ... arguably Web
3.0 is more about the 'real time' web then anything else and AJAX w/
JSON is powering that, but even as things stand now JSON won't be
enough in the near future and thats why I mentioned standards like
websockets.

Its a shame that binary (efficient) XML wasn't able to figure out
something reasonable and we are now having to speak about XML<->JSON
... but lets be clear for XML developers they probably will stick
with XML, any kind of attempt at a shorthand to try and square the
circle should be aimed at the web crowd. You (and others) are right to
point out the difficulties but harder things have been achieved.

re E4X I answered somewhat glibly last time, maybe suggesting adding a
serialize method to E4X that emitted some JSON would have a
synthesizing effect, e.g. browser developers could easily use XML or
JSON and it would be trivial to implement server side E4X.

cheers, J
Michael Kay
2010-07-26 12:59:28 UTC
Permalink
> What all this means is that XML seems to be losing a fight it
> shouldn't e.g. it *should* be the the 'no brainer' format for
> representing document orientated data but it seems that transport
> level optimization and integration characteristics of JSON is winning
> over web developers versus sheer power for documentation
> representation/manipulation.
>
>

I think we all need to be very wary of making assertions about who is
winning or losing, or what "web developers" are thinking, based on tiny
samples of chit chat overhead in the pub. Yes, we're in a
fashion-dominated industry, and trends are often determined by
self-fulfulling rumours about what the trends are, but it's not a
feedback cycle we should try to encourage. Let's try to base our
reasoning on verifiable facts.

Personally, the evidence I see is that XML is now the mainstream choice
for data interchange for all large projects -- we're in the peak
adoption phase. That's good news for anyone delivering into that
community. Inevitably there is chatter about alternatives among early
adopters. But you don't win in this game by chasing every new fad. At
this stage we should focus on meeting the needs of the large XML-using
community, not worrying about the people on the fringes who have decided
to use something else instead.

Michael Kay
James Fuller
2010-07-26 14:33:04 UTC
Permalink
On Mon, Jul 26, 2010 at 2:59 PM, Michael Kay <mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org> wrote:
> I think we all need to be very wary of making assertions about who is
> winning or losing, or what "web developers" are thinking, based on tiny
> samples of chit chat overhead in the pub. Yes, we're in a fashion-dominated
> industry, and trends are often determined by self-fulfulling rumours about
> what the trends are, but it's not a feedback cycle we should try to
> encourage. Let's try to base our reasoning on verifiable facts.

Fair enough and this is good advice for any technology selection but I
dont think any of us is bringing up this because its based on idle
chit chat, if you have data contrary to a vast amount of work going
online, at conferences, books and yes maybe even in pubs be happy to
see it.

I would also add that we need to be very careful about what groups we
are talking about e.g. enterprise versus web developers versus mobile
developers, etc.

I also agree we shouldn't have a crisis whenever an early adopter
comes along with something novel or faster, but as I previously
mentioned the real time web will demand ever more performance from the
software that all groups of developers create, if XML (and
technologies like XQuery, etc) are not keeping up in terms of the
performance race we make it easier for people to choose other things.
With Binary XML languishing where are these improvements going to
happen with data transport ?

James Fuller
Michael Kay
2010-07-26 15:07:39 UTC
Permalink
> Fair enough and this is good advice for any technology selection but I
> dont think any of us is bringing up this because its based on idle
> chit chat, if you have data contrary to a vast amount of work going
> online, at conferences, books and yes maybe even in pubs be happy to
> see it.
>
>

I don't have data - I don't think anyone does. But I'm pretty confident
that what people talk about at conferences and the like is not
statistically representative of what people are doing in the typical IT
shop. XML after 12 years has become respectable and boring, and it's
when technologies become respectable and boring that people stop talking
about them but carry on using them - and that's the phase when suppliers
make the most money. No-one talks about SQL any more, but everyone is
still using it.

Michael Kay
James Fuller
2010-07-26 16:12:27 UTC
Permalink
On Mon, Jul 26, 2010 at 5:07 PM, Michael Kay <mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org> wrote:
>
>> Fair enough and this is good advice for any technology selection but I
>> dont think any of us is bringing up this because its based on idle
>> chit chat, if you have data contrary to a vast amount of work going
>> online, at conferences, books and yes maybe even in pubs be happy to
>> see it.
>>
>>
>
> I don't have data - I don't think anyone does. But I'm pretty confident that
> what people talk about at conferences and the like is not statistically
> representative of what people are doing in the typical IT shop. XML after 12
> years has become respectable and boring, and it's when technologies become
> respectable and boring that people stop talking about them but carry on
> using them - and that's the phase when suppliers make the most money. No-one
> talks about SQL any more, but everyone is still using it.

I agree with you that XML is now part of every developers toolbox and
closer to peak adoption phase and boring. I am not trying to convince
you that your viewpoint is incorrect, its true but several things can
be true at the same time. Anyhow, I have drifted into xmldev territory
... so will try to bring this back to how it relates to xquery.

One thing that we can do is make XQuery very easy to work with web
services; expath http means we have a standard library with which to
build some more sophisticated and generic web services clients. Where
to go from here ?

thx, J
Brett Zamir
2010-07-26 09:53:48 UTC
Permalink
On 7/26/2010 5:18 PM, James Fuller wrote:
> Thx to Martin for the succinct overview of issues with XML<->JSON,
>
> I guess I am more leaning towards coming up with 'something different'
> ... maybe look at what serializing/dumping e4x internal representation
> is as I am assuming its a normal js object.
>
> I think if we get a serialization that maps directly onto a js object
> then we are done, e.g. we just need a representation that is
> consumable by javascript ... and I think e4x maybe the hook here.

E4X is only an extension to JavaScript, so it is only currently
supported in Mozilla, and V8 (Chrome) is open to including it if a third
party implements it: http://code.google.com/p/v8/issues/detail?id=235

E4X is treated as a fundamental type rather than as a regular object,
with its own accessors which are JS object-like (and more powerful if
filters are used, making it more XPath like).

For example,

var friends= <myData><friends>
<name type="imaginary">Snuffaluffagus</name>
<name type="real">Goober</name>
<name type="imaginary">Tooth Fairy</name>
</friends></myData>;

var realFriends= friends..name.(@type== 'real').toString(); // Goober


You can also reshape data. If we just change the equality check to an
assignment:

var realFriends= friends..name.(@type= 'real').toString(); // Goober


...we get as the results an "XMLList" (and the original element altered
in place):

<name type="real">Snuffaluffagus</name>
<name type="real">Goober</name>
<name type="real">Tooth Fairy</name>


To simulate some XQuery-like features, you might see
https://developer.mozilla.org/en/E4X_for_templating , but E4X does not
have a lot of built-in means of reshaping the XML other than as above or
creating element names, attribute names, or attribute values and element
content dynamically.

As far as serializing to more portable objects, there is a built-in
method, "toXMLString()" for serializing the whole E4X object which
represents the XML into a string (toString() will serialize the XML or
return the text contents if the referenced element is only containing
text). (Strings can in turn be converted to DOM objects or whatever.)

But there is no regular JSON traversal serialization.

Wonder if someone will make an E4X database that can also be XQueried... :)

Brett
Xavier Franc
2010-07-26 13:04:42 UTC
Permalink
thanks for your insights Dana,

> Switching between databases ain't as easy as switching between
> compilers.

Sure, but that's an even better reason to be cautious before selecting a
product.

> How much to know the names ? Just kidding :-)
> (just crossing fingers is not one of mine's...)
>
I passed the names to WikiLeaks anonymously, so you will know sooner or
later
(kidding too)

> >/ Perhaps a good independant benchmark would help.//
> /
> Hmmm. It's hard.
>
Certainly, and a benchmark would lead to endless arguments about
its relevance or impartiality.

However a few simple tests can give some interesting indications about
an implementation: I have a particularly dumb snippet which I try
on all XQuery engines I encounter:
count(1 to 10000000)

If an engine bombs on it or takes a long time, I have a good clue that
it is /not/ smartly implemented. Of course more tests are necessary, but
if a product fails such testing, there is little chance that it is able
to optimize
a heavy join with complex path expressions.
> When vendors start competing on numbers, it starts to be fun :-)
>
As an implementor I wouln't mind . I know that our product (Qizx)
has more to gain than to fear from such a competition. ;-)

Best
--

Xavier Franc
Elliotte Rusty Harold
2010-09-12 13:19:39 UTC
Permalink
On Mon, Jul 26, 2010 at 9:04 AM, Xavier Franc <xavier_f-1g3y2VubmHnQT0dZR+***@public.gmane.org> wrote:

> However a few simple tests can give some interesting indications about
> an implementation: I have a particularly dumb snippet which I try
> on all XQuery engines I encounter:
>     count(1 to 10000000)
>
> If an engine bombs on it or takes a long time, I have a good clue that
> it is not smartly implemented. Of course more tests are necessary, but
> if a product fails such testing, there is little chance that it  is able to
> optimize
> a heavy join with complex path expressions.


FWIW, I notice that this query brings down eXist 1.4 with an out of
memory error:

let $x := count(1 to 10000000)
return $x

:-(

--
Elliotte Rusty Harold
elharo-***@public.gmane.org
Wolfgang Meier
2010-09-12 14:21:49 UTC
Permalink
> FWIW, I notice that this query brings down eXist 1.4 with an out of
> memory error:
>
> let $x := count(1 to 10000000)
> return $x

Yes, it's not optimized in any way and generates a large sequence of
java.lang.BigInteger. I think I nearly never use a range expression,
so no work has been put into optimizing it.

Wolfgang
Dannes Wessels
2010-09-12 15:02:26 UTC
Permalink
ok, so you found an issue in exist-db....

why not report this to the development team , but write it directly en public?

D.



On 12 Sep 2010, at 15:19 , Elliotte Rusty Harold wrote:

> FWIW, I notice that this query brings down eXist 1.4 with an out of
> memory error:
>
> let $x := count(1 to 10000000)
> return $x
>
> :-(

Kind regards

Dannes

--
eXist-db Native XML Database - http://exist-db.org
Join us on linked-in: http://www.linkedin.com/groups?gid=35624
David
2010-09-12 15:04:21 UTC
Permalink
I just tried this on MarkLogic and its appears to not be optimized
well (although must be somewhat).

'count(1 to 10000000)'

Takes a few seconds
but
'count(1 to 100000000)'

Takes 10 times longer
So something linear is going on .

Saxon OTOH returns instantly


David A. Lee
dlee-***@public.gmane.org
http://www.xmlsh.org


On 9/12/2010 10:21 AM, Wolfgang Meier wrote:
>> FWIW, I notice that this query brings down eXist 1.4 with an out of
>> memory error:
>>
>> let $x := count(1 to 10000000)
>> return $x
> Yes, it's not optimized in any way and generates a large sequence of
> java.lang.BigInteger. I think I nearly never use a range expression,
> so no work has been put into optimizing it.
>
> Wolfgang
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Wolfgang Meier
2010-09-12 17:31:17 UTC
Permalink
>  I just tried this on MarkLogic and its appears to not be optimized well
> (although must be somewhat).

Well, it's easy enough to move this to lazy evaluation. Out of
curiosity, I just did it for eXist-db (SVN rev 12696). It took half an
hour, including a few full runs through the XQuery test suite.

count(1 to 10000000) now returns instantly as well ;-) I'm sure
MarkLogic can do this as easily if it is brought to their attention.

Wolfgang
Michael Kay
2010-09-12 18:33:48 UTC
Permalink
On 12/09/2010 4:04 PM, David wrote:
> I just tried this on MarkLogic and its appears to not be optimized
> well (although must be somewhat).
>
> 'count(1 to 10000000)'
>
> Takes a few seconds
> but
> 'count(1 to 100000000)'
>
> Takes 10 times longer
> So something linear is going on .
>
> Saxon OTOH returns instantly
>
It's likely that most engines will implement this in one of three ways:

(a) naively materialize the sequence of N integers in memory, then
obtain the size of the sequence.

(b) iterate over the sequence 1 to N and count how many iterations are
required

(c) (Saxon's implementation) recognize a RangeExpression as an
expression whose cardinality can be computed in a way specific to that
expression (here as (end - start + 1)) without actually evaluating the
expression. All singleton expressions and many expressions with
cardinality 0-or-1 fall into this category, as do a few others: for
example count(for F in S return R) reduces to count(S) if R is a
singleton expression.

The fact that one expression is optimized well (or badly) shouldn't
really be used to draw a general inference about the optimizer for a
particular product; you need to assess it over a reasonably
representative range of queries. I suspect that many products coming
from the database tradition have put most of their optimization efforts
into joins, because that's what 90% of the database optimization
literature concerns itself with. Unfortunately the vast majority of
XQuery queries probably do no joins at all.

Michael Kay
Saxonica
Kurt Cagle
2010-09-12 20:39:20 UTC
Permalink
The join issue in particular is a telling one, and one that to me shows the
danger of "generalizing optimizations". Some months back, I was working with
a group of Java developers that were also working (somewhat reluctantly)
with both XQuery and XForms in another project. They finally came at one
point to me and said that they were trying to figure out how to optimize
their XForms and XQuery so that they could do joins. I pointed out that the
concept of joins really doesn't apply in XQuery - you could certainly
de-reference a link or set of links and incorporate them, and that, at least
for the kind of queries they were performing, that was a very fast operation
out of MarkLogic (the database they were using), but there was no formal
join operator in that respect because it wasn't needed. Of course, two weeks
later they were still complaining about having problems with joins in their
XForms and XQueries.

The expression:
count(1 to $n)

is not optimized because there really is no reason for it to be - if you've
written something like this, then you've probably made a mistake, at which
point having the compiler seize up for a few minutes is probably not a bad
thing - it'll tell you that you've likely made a mistake.

On the other hand,

count(for $i in (1 to $n) return local:someFunction($i))

should be iterated (and really can't be optimized), because there's nothing
that says that local:someFunction($i) will not in and of itself return
either an empty sequence or a sequence of more than one item. On the other
hand, if the function local:someFunction($i) has a signature output of one
item then there may be some optimization that could be performed, though in
my experience this is also still very much an edge case.

Kurt Cagle
XML Architect
*Lockheed / US National Archives ERA Project*



On Sun, Sep 12, 2010 at 2:33 PM, Michael Kay <mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org> wrote:

> On 12/09/2010 4:04 PM, David wrote:
>
>> I just tried this on MarkLogic and its appears to not be optimized well
>> (although must be somewhat).
>>
>> 'count(1 to 10000000)'
>>
>> Takes a few seconds
>> but
>> 'count(1 to 100000000)'
>>
>> Takes 10 times longer
>> So something linear is going on .
>>
>> Saxon OTOH returns instantly
>>
>> It's likely that most engines will implement this in one of three ways:
>
> (a) naively materialize the sequence of N integers in memory, then obtain
> the size of the sequence.
>
> (b) iterate over the sequence 1 to N and count how many iterations are
> required
>
> (c) (Saxon's implementation) recognize a RangeExpression as an expression
> whose cardinality can be computed in a way specific to that expression (here
> as (end - start + 1)) without actually evaluating the expression. All
> singleton expressions and many expressions with cardinality 0-or-1 fall into
> this category, as do a few others: for example count(for F in S return R)
> reduces to count(S) if R is a singleton expression.
>
> The fact that one expression is optimized well (or badly) shouldn't really
> be used to draw a general inference about the optimizer for a particular
> product; you need to assess it over a reasonably representative range of
> queries. I suspect that many products coming from the database tradition
> have put most of their optimization efforts into joins, because that's what
> 90% of the database optimization literature concerns itself with.
> Unfortunately the vast majority of XQuery queries probably do no joins at
> all.
>
> Michael Kay
> Saxonica
>
>
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
>
Michael Kay
2010-09-12 21:39:11 UTC
Permalink
> count(1 to $n)
>
> is not optimized because there really is no reason for it to be - if
> you've written something like this, then you've probably made a mistake

In my experience, the argument "don't bother optimizing X because no
sane user would write X" is usually a bad argument.

(a) there are cases where the code is auto-generated rather than
human-written

(b) in particular, there are cases where the code is generated by a
previous optimization (this is very common!)

For example, consider a function

declare function local:is-long-sequence($arg) {
count($arg) gt $query-param
}

and the perfectly reasonable function call

local:is-long-sequence($n to $m)

An implementation that inlines the function call will generate the
expression count($n to $m) gt $query-param, which can then be further
optimized to ($m - $n + 1 gt $query-param).

(c) there are typically both compile-time and run-time optimizations,
where run-time optimization essentially means "lazy evaluation". Lazy
evaluation often means iterating over the sequence of values of the
expression one at a time, but it can also mean, for example, obtaining
the last item in the sequence or the number of items in the sequence
without evaluating the rest of the sequence.

Michael Kay
Saxonica
>
> On the other hand,
>
> count(for $i in (1 to $n) return local:someFunction($i))
>
> should be iterated (and really can't be optimized), because there's
> nothing that says that local:someFunction($i) will not in and of
> itself return either an empty sequence or a sequence of more than one
> item. On the other hand, if the function local:someFunction($i) has a
> signature output of one item then there may be some optimization that
> could be performed, though in my experience this is also still very
> much an edge case.
>
> Kurt Cagle
> XML Architect
> /Lockheed / US National Archives ERA Project/
>
>
>
> On Sun, Sep 12, 2010 at 2:33 PM, Michael Kay <mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org
> <mailto:mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org>> wrote:
>
> On 12/09/2010 4:04 PM, David wrote:
>
> I just tried this on MarkLogic and its appears to not be
> optimized well (although must be somewhat).
>
> 'count(1 to 10000000)'
>
> Takes a few seconds
> but
> 'count(1 to 100000000)'
>
> Takes 10 times longer
> So something linear is going on .
>
> Saxon OTOH returns instantly
>
> It's likely that most engines will implement this in one of three
> ways:
>
> (a) naively materialize the sequence of N integers in memory, then
> obtain the size of the sequence.
>
> (b) iterate over the sequence 1 to N and count how many iterations
> are required
>
> (c) (Saxon's implementation) recognize a RangeExpression as an
> expression whose cardinality can be computed in a way specific to
> that expression (here as (end - start + 1)) without actually
> evaluating the expression. All singleton expressions and many
> expressions with cardinality 0-or-1 fall into this category, as do
> a few others: for example count(for F in S return R) reduces to
> count(S) if R is a singleton expression.
>
> The fact that one expression is optimized well (or badly)
> shouldn't really be used to draw a general inference about the
> optimizer for a particular product; you need to assess it over a
> reasonably representative range of queries. I suspect that many
> products coming from the database tradition have put most of their
> optimization efforts into joins, because that's what 90% of the
> database optimization literature concerns itself with.
> Unfortunately the vast majority of XQuery queries probably do no
> joins at all.
>
> Michael Kay
> Saxonica
>
>
> _______________________________________________
> talk-***@public.gmane.org <mailto:talk-***@public.gmane.org>
> http://x-query.com/mailman/listinfo/talk
>
>
Martin Probst
2010-09-13 07:21:15 UTC
Permalink
> (a) naively materialize the sequence of N integers in memory, then obtain
> the size of the sequence.
>
> (b) iterate over the sequence 1 to N and count how many iterations are
> required
>
> (c) (Saxon's implementation) recognize a RangeExpression as an expression
> whose cardinality can be computed in a way specific to that expression (here
> as (end - start + 1)) without actually evaluating the expression. All
> singleton expressions and many expressions with cardinality 0-or-1 fall into
> this category, as do a few others: for example count(for F in S return R)
> reduces to count(S) if R is a singleton expression.

While xDB is in the (c) camp, I think it does make a significant
difference whether you do (a) or (b). Even if certain kinds of
expressions are rare and maybe not worth optimising (debatable in this
context), not doing pervasive lazy evaluation makes certain things
simply impossible, which is probably quite annoying for a user.

Not to say that there aren't a lot of contexts in which xDB will get
into trouble as well...

Regards,
Martin
Xavier Franc
2010-07-26 14:01:19 UTC
Permalink
> *Pavel Minaev:*
> My take on this is not from user's perspective so much so as from
> library developer's perspective. Let's say that I want to write some
> code reusable between implementations. Can I refactor a single query
> into several functions (say, for the sake of readability), applied in
> a chain, such that the first invocation does some filtering, the
> second one does some expensive per-item processing, and the last one
> does more filtering? If I know that evaluation of sequences is
> pervasively lazy, then the answer is "yes" - since the second function
> will only process items that are demanded by the third function. If I
> don't have such guarantee, then such code may have a very drastic
> performance difference across various implementations. As a library
> developer, quite obviously, I cannot afford that - I want the library
> to be useful to as many people out there as possible
>
That's pretty brave of you... however as you notice, there is still
no guarantee, moreover it will be a hard and tedious job.
It seems better to write clean and sensible code, and too bad
for implementations that cannot cope. I am really for natural
selection and against "dumbing down"!

> Alternatively, I can put additional prerequisites onto the authored
> library, aside from a conformant XQuery 1.x implementation - for
> example, I can further specify that implementation must have lazy
> sequence evaluation for the library to actually be usable. But then
> I'm doing precisely what I suggested doing in this thread, except that
> now every author does it for himself, and their sets of requirements
> above spec conformance need not even match...
>
Of course that sounds nice. As an implementor I don't mind it.
Specified informally, it would have at least the advantage
of increasing the awareness/culture of users and implementors
about such issues.
But making it a part of the language seems very difficult and also
a bit of over-legislating, IMHO.
In addition it would be something probably never seen before...
-

Xavier Franc
Lionel Villard
2010-07-27 14:40:14 UTC
Permalink
I'm following this interesting thread and I can add few more insights
on this topic.

First, I think the name XQuery really sucks. XQuery is a query
language as the name implies it. It does not carry the idea of a more
general data manipulation language. I'll put this flaw in the
marketing category. XMLScript might have been a better name (but I
think it's already taken).

Still on the marketing side, what missing is a sexy vendor-neutral web
site. A bit like php.net and ruby-lang.org. Javascript does not need
such a web site since web developers don't have choice on the client:
Javascript is a monopoly.

Some people on the list mentioned the fact the XQuery is more concise
than XSLT. That's true but compare to Javascript, XQuery is quite
verbose. For instance, declaring an assignable global variable in
XQuery Scripting is ... well verbose:

declare assignable variable $v;

compare to javascript equivalent:

var v;

XQuery verbosity can really become an issue in the browser, especially
for updating the HTML DOM. For instance:

replace node $node/@align with 'center'

compare to javascript equivalent:

node.align = 'center';

Just to mention the biggies...

Lionel



On Jul 24, 2010, at 4:59 PM, Daniela Florescu wrote:

> Hello everybody,
>
> That was a very interesting discussion, thank you all.
>
> I was trying to learn from the feedback we've got for my question
> "What's wrong with XQuery?".
>
> I tried to compile the answers into something that I could
> understand, and I thought that
> it may be interesting to share it.
>
> I did put together a list of citations from this thread, together
> with the author's email address.
>
> Hope that helps, have a good weekend,
> Dana
>
>
> =
> =
> ======================================================================
> =
> =
> ======================================================================
>
>
> A. Language issues
> =================
>
> A.1) General limitations of the language
> -------------------------------------------------------
>
> **** If your input xml is in no namespace, and your output xml changes
> the default namespace (such as a bog standard xml -> xhtml transform)
> then you hit that massive issue of the xpath default namespace
> change.
>
> andrew.j.welch-***@public.gmane.org
>
> **** In a "process it if its there" scenario which is very common
> in xml
> transforms, you have to constantly check if the input is there before
> creating the output structure:
>
> andrew.j.welch-***@public.gmane.org
>
> **** In XQuery, I have to jump through hoops with
> closures to return two sequences from a function
>
> int19h-***@public.gmane.org
>
> **** I can't describe types in XQuery
> itself, and XML Schema is such a pain to deal with, with multiple
> limitations and quirks, not to mention a completely different syntax.
>
> int19h-***@public.gmane.org
>
> **** dynamic typing on XQuery is even more lax then usual
>
> int19h-***@public.gmane.org
>
> **** the lack of type polymorphism is also limiting for a
> language with explicit type declarations.
>
> int19h-***@public.gmane.org
>
>
> **** lacking virtual function dispatch based on the argument types
>
> sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org
>
> A.2) Missing useful features from XSLT
> ------------------------------------------------------
>
> **** It seems that the apply-template and template/@match, so far,
> cannot
> be done against an XML DB.
>
> rob-/p3dT1ntlUHQT0dZR+***@public.gmane.org
>
> **** you could have something analogous to apply-templates (at least
> for element types), even if not the full pattern-matching.
>
> sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org
>
> **** Even something like a simple "match" statement, analogous to
> switch but matching against patterns, would significantly improve
> XQuery transformations
>
> int19h-***@public.gmane.org
>
> ****The recursive typeswitch is no substitute for XSLT's recursive
> descent...
>
> andrew.j.welch-***@public.gmane.org
>
> A.3) Not complete as a programming language
> -----------------------------------------------------------------
>
> **** XQuery is turing complete, but other than that, it is not a
> complete
> programming language
>
> mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org
>
> **** in 1.0 (the stable spec so far) we have a pure, declarative
> language which doesn't
> have functions as first-class values. So it's neither truly
> functional, and not at all imperative.
>
> int19h-***@public.gmane.org
>
> **** For a general-purpose language, that's weak - how am I supposed
> to write concise code, or reuse it, to
> the same extent I'm used to in other modern languages?
>
> int19h-***@public.gmane.org
>
>
> **** XQuery isn't a general purpose language - it doesn't
> provide a complete programming environment, for example through the
> lack of data types.
>
> mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org
>
> **** one very quickly runs into needing to use vendor specific
> extensions/hacks
>
> jdmitchell-***@public.gmane.org
>
> B. Architectural Issue.
> ==================
>
> B.1) Does not satisfy well the semi-structured data /programmatic
> data camp.
> ----------------------------------------------------------------------------------------------------------
>
> **** the need for good languages which support the growth of semi-
> structured and what I've been calling ad hoc structured data models
>
> jdmitchell-***@public.gmane.org
>
> **** there is data exchanged for programmatic purposes, where XML
> sucks and JSON wins.
>
> mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org
>
> **** XML good (though overcomplicated) for trees, tolerable though
> bloated for
> linear data, and horrible for free-form graphs.
>
> int19h-***@public.gmane.org
>
> **** XML is great for markup
> and semi-structured content, but it's really no good data model to
> program against generally. Part of this comes from the difference
> between node-labeled trees and edge-labled trees, but mostly it is
> that XML exposes a lot of underlying complexity necessary for markup
> use cases, and even if you try to abstract that, it will leak.
>
> mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org
>
> B.2. Does not integrate well yet with client software
> ---------------------------------------------------------------------
>
> **** we need to make XML work better inside of existing Web 2.0 AJAX
> frameworks by adding a MVC abstraction
>
> awspyker-***@public.gmane.org
>
> **** With JavaScript getting faster, I reiterate my idea that a
> JavaScript library be built.
>
> brettz9-/***@public.gmane.org
>
> B.3. XML and JSON
> -------------------------------
>
> **** how to create a XML to JSON reversible lossless translation
> that results is "good" markup on both sides
>
> dlee-***@public.gmane.org
>
> **** the future is going to see XQuery as a specialty
> technology used in content management applications, and JSON databases
> taking market share from relational databases.
>
> mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org
>
> **** how are you going to tweak the xquery language to be json
> focused (or at least parity) w.r.t. xml?
>
> jdmitchell-***@public.gmane.org
>
> **** JSON is "good enough" to
> generalize XML. And one could reasonably argue that it makes more
> sense that way, because there are noticeably fewer concepts in JSON
> than there are in XDM (or Infoset).
>
> int19h-***@public.gmane.org
>
> C. Performance
> =============
>
> **** I don't think the optimizers are good enough to handle it
> automagically on their own in
> all cases, either.
>
> int19h-***@public.gmane.org
>
> **** Performance behavior should be predictable over different
> implementation
>
> int19h-***@public.gmane.org
>
> **** you are the optimizer, ie fiddle with until you get the best
> performance.
>
> sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org
>
> **** does xquery really lend itself to that sort of optimization?
>
> sokolov-vDdCxMo47w5Wk0Htik3J/***@public.gmane.org
>
>
> D. Portability
> ===========
>
> **** You spend the rest of the day massaging it so that it actually
> runs with reasonable
> performance. And then you switch the backend and find out that your
> query plan is totally different.
>
> int19h-***@public.gmane.org
>
> **** some way to explicitly
> request eager and - especially! - lazy behavior would be extremely
> handy for portable code.
>
>
> int19h-***@public.gmane.org
>
> **** portability is
> important if you want third-party libraries and frameworks to appear
> (and not immediately fragment between implementations).
>
> int19h-***@public.gmane.org
>
> E. Existing XQuery-related Software
> ============================
>
> **** there are still those of us devs who have our own personal
> sites and want a toy to play with, that will remain ours and don't
> need to pay anything extra for it, especially if we're mostly just
> learning it to use it for hobby coding or small existing sites,
> train ourselves, etc. I think it would be great to have both options
> available...
>
> brettz9-/***@public.gmane.org
>
> **** People want stuff that's superfast and scalable but don't want
> to have to pay for it.
>
> jdmitchell-***@public.gmane.org
>
> **** the XQuery implementation(s) aren't part of any of the
> standard, el-cheapo shared hosting bundles
>
> jdmitchell-***@public.gmane.org
>
> **** so many vendors held various aspects of the language,
> libraries, tooling hostage for their own benefit/purposes
>
> jdmitchell-***@public.gmane.org
>
> **** There's the vendor lock-in problems.
>
> jdmitchell-***@public.gmane.org
>
> F. Tools
> =======
>
> **** Development tools for other languages are cheap or even free,
> yet powerful
> and full-featured.
>
> int19h-***@public.gmane.org
>
> G. Libraries
> =========
>
> **** Libraries and frameworks? you get an implementation, typically
> on top of
> an XML database, providing various proprietary frameworks of its own
> tied to that implementation. So if you want to write a full-fledged
> app, you get vendor lock-in all the way down.
>
> int19h-***@public.gmane.org
>
> *** the development of good libraries/frameworks is hampered
> by language limitations
>
> int19h-***@public.gmane.org
>
> H. Marketing, standards, positioning and perception
> =========================================
>
> **** XQuery is just *not* an interesting topic. It had it's hype
> heyday and it didn't capitalize on that and so its mindshare (and
> hypeshare) are low.
>
> jdmitchell-***@public.gmane.org
>
> **** Even if we were to stipulate that technically it is a general
> purpose programming language, the perception of basically everybody
> is that it's not.
>
> jdmitchell-***@public.gmane.org
>
> **** Nobody submitted a session on xquery.
>
> jdmitchell-***@public.gmane.org
>
> **** XQuery suffers from it being tied so directly with all of the
> hype of "XML".
>
> jdmitchell-***@public.gmane.org
>
> **** XQuery suffers from its history in the standardization by
> committee hell.
>
> jdmitchell-***@public.gmane.org
>
> **** it took forever for there to be a standard
>
> jdmitchell-***@public.gmane.org
>
> **** The "XML is evil" problem.
>
> jdmitchell-***@public.gmane.org
>
> **** Even with all of the latest updates to xquery, it's 2010 and
> frankly it's still a mess.
>
> jdmitchell-***@public.gmane.org
>
> *** The only advantage of XQuery that I see is its concise syntax.
> Everything else is mediocre or sub-par.
>
> int19h-***@public.gmane.org
>
> **** XQuery with its strange data model, its mix of syntactic
> styles, its odd combination of weak and strong type checking, and
> its confused positioning between a database query language, an XML
> transformation language, and a general-purpose web programming
> language.
>
> mike-JkSD5nQpfvpWk0Htik3J/***@public.gmane.org
>
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Martin Probst
2010-07-27 14:58:29 UTC
Permalink
> Some people on the list mentioned the fact the XQuery is more concise than
> XSLT. That's true but compare to Javascript, XQuery is quite verbose.

I think that is well known, and simply comes back to the fact that
XQuery is a query language, and was designed to be a query language.
If you'd design a general purpose programming language, you'd probably
make sure that common things like declaring functions or variables
have concise syntax.

In XQuery they don't - the only thing that is really concise, and much
more concise than equivalents in most other languages (including JS)
are the query statements (XPath/FLWOR). Which is quite natural.

Again, if one wants to have a general purpose nice programming
language, potentially with XML support, he should either start from a
different point with different design goals, or just use one of the
readily available languages (Scala, VB, maybe E4X).

Martin
Michael Kay
2010-07-27 15:28:23 UTC
Permalink
> I think that is well known, and simply comes back to the fact that
> XQuery is a query language, and was designed to be a query language.
>

Or more particularly, it comes down to the fact that the tradition for
the syntactic style of query languages is rather verbose, deriving from
the SQL and COBOL tradition of pseudo English sentences.

But I'm not sure this is an issue. When XQuery and XPath have
overlapping constructs, the XPath construct tends to be much more
concise, yet I see people consistently preferring the XQuery form. For
example, people seem happy to write

for $e in //employee where $e/salary > 1000 return $e

in preference to

$e[salary > 1000]

It's also true that there might be individual constructs that have fewer
keystrokes in Javascript, but once you write any complex logic, XQuery
wins hands down on the number of lines of code. That fact isn't going to
sway people away from Javascript - as witness the number of people using
laboriously-written Javascript in preference to XForms.

But I think this is irrelevant. I don't think minor syntactic details
like this have any significant bearing on the adoption of XQuery. In
fact, I think there's a general law of computing that the quality of a
standard has very little bearing on its adoption. People don't use
Javascript (or Windows, or the Qwerty keyboard, or rfc822 email) because
of its goodness, they use it because of its ubiquity.

Michael Kay
Saxonica
Martin Probst
2010-07-27 15:44:43 UTC
Permalink
I think the tendency of people to write complicated for loops instead
of concise expressions comes from their lack of understanding of the
language, not from a preference for verboseness. I have taught (or
tried to teach) XQuery at several instances, and most people had
massive problems understanding the nature of sequences, the concept of
the context item, etc, so they probably stick to more complex
expressions out of ignorance and/or fear to get something wrong. Or
maybe I'm a bad teacher ;-)

> It's also true that there might be individual constructs that have fewer
> keystrokes in Javascript, but once you write any complex logic, XQuery wins
> hands down on the number of lines of code.

I think that is a bit too bold for a statement. If you implement
complex querying logic, XQuery certainly wins. But there are some
things that are quite hard to express in XQuery.

> That fact isn't going to sway
> people away from Javascript - as witness the number of people using
> laboriously-written Javascript in preference to XForms.

I think many people don't see XForms as a viable option (though I'm
not exactly sure why, there are several nice client side and server
side implementations available, including our own EMC Formula). Maybe
they fear the added indirection between the browser and their app, and
the added complexity in development, debugging, etc that comes with
it.

> But I think this is irrelevant. I don't think minor syntactic details like
> this have any significant bearing on the adoption of XQuery. In fact, I
> think there's a general law of computing that the quality of a standard has
> very little bearing on its adoption. People don't use Javascript (or
> Windows, or the Qwerty keyboard, or rfc822 email) because of its goodness,
> they use it because of its ubiquity.

Agreed.

Martin
Pavel Minaev
2010-07-28 21:42:25 UTC
Permalink
On Tue, Jul 27, 2010 at 7:44 PM, Martin Probst <mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org> wrote:
> I think the tendency of people to write complicated for loops instead
> of concise expressions comes from their lack of understanding of the
> language, not from a preference for verboseness. I have taught (or
> tried to teach) XQuery at several instances, and most people had
> massive problems understanding the nature of sequences, the concept of
> the context item, etc, so they probably stick to more complex
> expressions out of ignorance and/or fear to get something wrong. Or
> maybe I'm a bad teacher ;-)

Not really. Pesonally, I often opt for FLWOR in favor of XPath steps
for lengthy and complicated queries (esp. with many predicates),
because I find it to be much more readable, and the layout makes it
easier to place comments in it explaining what is what, and have them
clearly associated with specific parts of the query - which is much
harder with XPath.

Speaking of XQuery syntax more generally, I actually find it to be
very nice. Verbose? Yes, but not excessively so, and in return it is
very readable (probably more so than any other language I know, in
fact), and its syntax is quite uniform and thus predictable. One can
only wish SQL was designed with similar attention to look.
Brett Zamir
2010-07-27 16:06:34 UTC
Permalink
On 7/27/2010 10:58 PM, Martin Probst wrote:
> If you'd design a general purpose programming language, you'd probably
> make sure that common things like declaring functions or variables
> have concise syntax.

Functions don't seem so bad if you leave out the types... (though I use
functions enough (in JavaScript) that even the word "function" is too
long for me.

The required "else" doesn't seem so friendly to RAD though, and yes, a
lot in the prolog seems potentially unnecessary like "declare".

Brett
Mike Sokolov
2010-07-27 16:46:18 UTC
Permalink
OK it's way, way too late to change any of this, but as long as we are
airing pet peeves:

It can be very confusing for C++/Java/Javascript programmers that commas
have such broad syntactic scope. In those languages, commas mostly
appear only in argument lists (OK there are edge cases: static
constructors, comma operator in C++, maybe others?) but in xquery, a
comma can introduce an entirely new statement because a sequence can
appear pretty much anywhere and there is no required statement
terminator (like semicolon). It's a lot of power for only a few pixels!

-Mike

On 07/27/2010 12:06 PM, Brett Zamir wrote:
> On 7/27/2010 10:58 PM, Martin Probst wrote:
>> If you'd design a general purpose programming language, you'd probably
>> make sure that common things like declaring functions or variables
>> have concise syntax.
>
> Functions don't seem so bad if you leave out the types... (though I
> use functions enough (in JavaScript) that even the word "function" is
> too long for me.
>
> The required "else" doesn't seem so friendly to RAD though, and yes, a
> lot in the prolog seems potentially unnecessary like "declare".
>
> Brett
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
David
2010-07-27 17:00:33 UTC
Permalink
If only there were a more power/pixel character we could use to confuse
people
.

-------------------------
David A. Lee
dlee-***@public.gmane.org
http://www.calldei.com
http://www.xmlsh.org


On 7/27/2010 12:46 PM, Mike Sokolov wrote:
> OK it's way, way too late to change any of this, but as long as we are
> airing pet peeves:
>
> It can be very confusing for C++/Java/Javascript programmers that
> commas have such broad syntactic scope. In those languages, commas
> mostly appear only in argument lists (OK there are edge cases: static
> constructors, comma operator in C++, maybe others?) but in xquery, a
> comma can introduce an entirely new statement because a sequence can
> appear pretty much anywhere and there is no required statement
> terminator (like semicolon). It's a lot of power for only a few pixels!
>
> -Mike
>
> On 07/27/2010 12:06 PM, Brett Zamir wrote:
>> On 7/27/2010 10:58 PM, Martin Probst wrote:
>>> If you'd design a general purpose programming language, you'd probably
>>> make sure that common things like declaring functions or variables
>>> have concise syntax.
>>
>> Functions don't seem so bad if you leave out the types... (though I
>> use functions enough (in JavaScript) that even the word "function" is
>> too long for me.
>>
>> The required "else" doesn't seem so friendly to RAD though, and yes,
>> a lot in the prolog seems potentially unnecessary like "declare".
>>
>> Brett
>> _______________________________________________
>> talk-***@public.gmane.org
>> http://x-query.com/mailman/listinfo/talk
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Daniela Florescu
2010-07-28 07:26:45 UTC
Permalink
> but in xquery, a comma can introduce an entirely new statement
> because a sequence can appear pretty much anywhere and there is no
> required statement terminator

Mike,

I am not sure what you mean by that.

Which version of XQuery do you refer to ? (Update ? Scripting ?)

In any case, comma never introduces another "statement" like in
imperative programming languages. At least in none of the XQuery-related
documents that we have so far.

That might have been a misunderstanding.

(The blame is for the document if it is unclear. But which document ?)

Best regards
Dana
Michael Kay
2010-07-28 08:19:45 UTC
Permalink
On 28/07/2010 08:26, Daniela Florescu wrote:
>> but in xquery, a comma can introduce an entirely new statement
>> because a sequence can appear pretty much anywhere and there is no
>> required statement terminator
>
> Mike,
>
> I am not sure what you mean by that.
>

I suspect it's the problem of

for $x in .....
let $y := .....
let $z := .....
where $a = $b
return X, Y, Z

where the operator precedence of the "," often trips people up. I think
there's a general expectation that binary infix operators should have
higher precedence than expressions written as pseudo-English sentences
(commonly referred to as "statements", by analogy with other languages).
I think we got that wrong; but of course, it's impossible to change now.

Michael Kay
Saxonica
Andrew Welch
2010-07-28 09:10:30 UTC
Permalink
On 28 July 2010 08:26, Daniela Florescu <dflorescu-***@public.gmane.org> wrote:
>>  but in xquery, a comma can introduce an entirely new statement because a
>> sequence can appear pretty much anywhere and there is no required statement
>> terminator
>
> Mike,
>
> I am not sure what you mean by that.

What does this output:

for $l in ('a', 'b', 'c')
return $l, 'd'

a d b d c d
a b c d
...others

(that was an interview question I saw recently)


--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
Daniela Florescu
2010-07-28 17:31:49 UTC
Permalink
Andrew,

It's not a matter of statements introduced by ",".

Comma is just an infix operator like +. (granted, not as frequently used
in programming languages like +, but still widely used).

It's a question of the precedence of the operators in the grammar-
and I don't understand why is this different from other languages.

When you learn what the semantics of

1+2*3

you have to learn that * has higher precedence then +.

Same here, FLWOR expressions have higher precedence.

Best regards
Dana


On Jul 28, 2010, at 2:10 AM, Andrew Welch wrote:

> for $l in ('a', 'b', 'c')
> return $l, 'd'
Mike Sokolov
2010-07-28 18:04:47 UTC
Permalink
The comparison with arithmetic operators is a bit of a stretch, since we
all learnt that precedence in elementary school math class. But I do
agree, it's just a question of operator precedence, not a question of
right or wrong - just a question of convention and matching user
expectation. That's why I introduced my complaint with "programmers
used to C++/Java/ etc." I guess I feel there is a principle that in the
absence of a good reason to do things otherwise, one should stick to
convention since it helps with quick understanding for new users. It
was somewhat surprising to me, coming from that background, that the
precedence of "," was so low, and from the other responses, it seems
some others shared (or at least have been made aware of) the same
confusion, although it is, as you say, easily remedied.

-Mike

On 07/28/2010 01:31 PM, Daniela Florescu wrote:
> Andrew,
>
> It's not a matter of statements introduced by ",".
>
> Comma is just an infix operator like +. (granted, not as frequently used
> in programming languages like +, but still widely used).
>
> It's a question of the precedence of the operators in the grammar-
> and I don't understand why is this different from other languages.
>
> When you learn what the semantics of
>
> 1+2*3
>
> you have to learn that * has higher precedence then +.
>
> Same here, FLWOR expressions have higher precedence.
>
> Best regards
> Dana
>
>
> On Jul 28, 2010, at 2:10 AM, Andrew Welch wrote:
>
>> for $l in ('a', 'b', 'c')
>> return $l, 'd'
>
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Daniela Florescu
2010-07-28 18:30:01 UTC
Permalink
Mike

What I understand from this discussion is that there are two
independent problems.

1. Maybe the precedence of "," was wrongly chosen in XQuery 1.0, and
the current design
is not the most intuitive.

Even if this is true, that's too late to change that, unfortunately.

2. Maybe instead of DirectElementConstructor as an expression we should
allow an XML fragments (aka a sequence of XML
DirectElementConstructors, without commas
in between), to make it look and feel more like real XML.

Granted, I have the same understanding problem with 9 out of 10 new
people to whom I am teaching XQuery.

W3C should consider it.

Best regards
Dana





On Jul 28, 2010, at 11:04 AM, Mike Sokolov wrote:

> The comparison with arithmetic operators is a bit of a stretch,
> since we all learnt that precedence in elementary school math
> class. But I do agree, it's just a question of operator precedence,
> not a question of right or wrong - just a question of convention and
> matching user expectation. That's why I introduced my complaint
> with "programmers used to C++/Java/ etc." I guess I feel there is a
> principle that in the absence of a good reason to do things
> otherwise, one should stick to convention since it helps with quick
> understanding for new users. It was somewhat surprising to me,
> coming from that background, that the precedence of "," was so low,
> and from the other responses, it seems some others shared (or at
> least have been made aware of) the same confusion, although it is,
> as you say, easily remedied.
>
> -Mike
>
> On 07/28/2010 01:31 PM, Daniela Florescu wrote:
>> Andrew,
>>
>> It's not a matter of statements introduced by ",".
>>
>> Comma is just an infix operator like +. (granted, not as frequently
>> used
>> in programming languages like +, but still widely used).
>>
>> It's a question of the precedence of the operators in the grammar-
>> and I don't understand why is this different from other languages.
>>
>> When you learn what the semantics of
>>
>> 1+2*3
>>
>> you have to learn that * has higher precedence then +.
>>
>> Same here, FLWOR expressions have higher precedence.
>>
>> Best regards
>> Dana
>>
>>
>> On Jul 28, 2010, at 2:10 AM, Andrew Welch wrote:
>>
>>> for $l in ('a', 'b', 'c')
>>> return $l, 'd'
>>
>> _______________________________________________
>> talk-***@public.gmane.org
>> http://x-query.com/mailman/listinfo/talk
Michael Rys
2010-07-28 18:35:58 UTC
Permalink
I remember that we had considerable discussions about both during the design of the language. Unfortunately, there was no easy way to do either if I remember correctly...

Michael

-----Original Message-----
From: talk-bounces-***@public.gmane.org [mailto:talk-bounces-***@public.gmane.org] On Behalf Of Daniela Florescu
Sent: Wednesday, July 28, 2010 11:30 AM
To: Mike Sokolov
Cc: talk-***@public.gmane.org Talk
Subject: Re: Comma problem Re: [xquery-talk] ANSWERS to "What's wrong with XQuery" question

Mike

What I understand from this discussion is that there are two independent problems.

1. Maybe the precedence of "," was wrongly chosen in XQuery 1.0, and the current design is not the most intuitive.

Even if this is true, that's too late to change that, unfortunately.

2. Maybe instead of DirectElementConstructor as an expression we should allow an XML fragments (aka a sequence of XML DirectElementConstructors, without commas in between), to make it look and feel more like real XML.

Granted, I have the same understanding problem with 9 out of 10 new people to whom I am teaching XQuery.

W3C should consider it.

Best regards
Dana





On Jul 28, 2010, at 11:04 AM, Mike Sokolov wrote:

> The comparison with arithmetic operators is a bit of a stretch, since
> we all learnt that precedence in elementary school math class. But I
> do agree, it's just a question of operator precedence, not a question
> of right or wrong - just a question of convention and matching user
> expectation. That's why I introduced my complaint with "programmers
> used to C++/Java/ etc." I guess I feel there is a principle that in
> the absence of a good reason to do things otherwise, one should stick
> to convention since it helps with quick understanding for new users.
> It was somewhat surprising to me, coming from that background, that
> the precedence of "," was so low, and from the other responses, it
> seems some others shared (or at least have been made aware of) the
> same confusion, although it is, as you say, easily remedied.
>
> -Mike
>
> On 07/28/2010 01:31 PM, Daniela Florescu wrote:
>> Andrew,
>>
>> It's not a matter of statements introduced by ",".
>>
>> Comma is just an infix operator like +. (granted, not as frequently
>> used in programming languages like +, but still widely used).
>>
>> It's a question of the precedence of the operators in the grammar-
>> and I don't understand why is this different from other languages.
>>
>> When you learn what the semantics of
>>
>> 1+2*3
>>
>> you have to learn that * has higher precedence then +.
>>
>> Same here, FLWOR expressions have higher precedence.
>>
>> Best regards
>> Dana
>>
>>
>> On Jul 28, 2010, at 2:10 AM, Andrew Welch wrote:
>>
>>> for $l in ('a', 'b', 'c')
>>> return $l, 'd'
>>
>> _______________________________________________
>> talk-***@public.gmane.org
>> http://x-query.com/mailman/listinfo/talk

_______________________________________________
talk-***@public.gmane.org
http://x-query.com/mailman/listinfo/talk
Martin Probst
2010-07-28 21:27:08 UTC
Permalink
> I remember that we had considerable discussions about both during the design of the language. Unfortunately, there was no easy way to do either if I remember correctly...

I think the problem is with lookahead on the grammar level:

"<foo/> <bar" vs "<foo/> <bar/>" vs "<foo/> <bar> 5"

I.e. foo element lower than bar, foo element followed by bar element,
foo element lower than bar greater than 5. This is not ambiguous, but
probably requires infinite lookahead.

Martin
Pavel Minaev
2010-07-28 21:47:11 UTC
Permalink
On Thu, Jul 29, 2010 at 1:27 AM, Martin Probst <mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org> wrote:
>> I remember that we had considerable discussions about both during the design of the language. Unfortunately, there was no easy way to do either if I remember correctly...
>
> I think the problem is with lookahead on the grammar level:
>
> "<foo/> <bar" vs "<foo/> <bar/>" vs "<foo/> <bar> 5"
>
> I.e. foo element lower than bar, foo element followed by bar element,
> foo element lower than bar greater than 5. This is not ambiguous, but
> probably requires infinite lookahead.

An obvious alternative would be to provide some explicit top-level
construct that introduces (and clearly delineates) an XML fragment as
a sequence of direct element constructors, with no need for
interleaving commas.
David Carlisle
2010-07-28 23:28:42 UTC
Permalink
On 28/07/2010 22:47, Pavel Minaev wrote:
> An obvious alternative would be to provide some explicit top-level
> construct that introduces (and clearly delineates) an XML fragment as
> a sequence of direct element constructors, with no need for
> interleaving commas.

<x>......</x>/*

?

David
Michael Kay
2010-07-29 10:11:45 UTC
Permalink
> I think the problem is with lookahead on the grammar level:
>
> "<foo/> <bar" vs "<foo/> <bar/>" vs"<foo/> <bar> 5"
>
> I.e. foo element lower than bar, foo element followed by bar element,
> foo element lower than bar greater than 5. This is not ambiguous, but
> probably requires infinite lookahead.
>
>

It could be done with an extra-grammatical disambiguation rule not
unlike many of the rules we already have: if the next thing after the
">" at the end of a direct constructor is immediately followed by "<",
treat the "<" as the start of another direct constructor unless it's
immediately followed by (space, "<", or "="). That's an incompatibility:
anyone writing (<foo/> <bar) would have to add a space; but we've
tolerated incompatibilities to highly-implausible constructs in the
past. No-one writes an element constructor in a context where the result
has to be immediately atomized.

Michael Kay
Saxonica
Daniela Florescu
2010-07-31 05:52:36 UTC
Permalink
>>
>
> It could be done with an extra-grammatical disambiguation rule not
> unlike many of the rules we already have: if the next thing after
> the ">" at the end of a direct constructor is immediately followed
> by "<", treat the "<" as the start of another direct constructor
> unless it's immediately followed by (space, "<", or "="). That's an
> incompatibility: anyone writing (<foo/> <bar) would have to add a
> space; but we've tolerated incompatibilities to highly-implausible
> constructs in the past. No-one writes an element constructor in a
> context where the result has to be immediately atomized.

I would be much in favor of this.

The fact that one can write a single XML element in XQuery, but not
XML fragments, hunts me every time
I teach XQuery, and I am honestly tired of this.

Best regards
Dana
David Lee
2010-07-31 12:13:02 UTC
Permalink
Hurts me every time I write an XML file as well !!!


David A Lee

On Jul 31, 2010, at 1:52 AM, Daniela Florescu <dflorescu-***@public.gmane.org> wrote:

>>>
>>
>> It could be done with an extra-grammatical disambiguation rule not unlike many of the rules we already have: if the next thing after the ">" at the end of a direct constructor is immediately followed by "<", treat the "<" as the start of another direct constructor unless it's immediately followed by (space, "<", or "="). That's an incompatibility: anyone writing (<foo/> <bar) would have to add a space; but we've tolerated incompatibilities to highly-implausible constructs in the past. No-one writes an element constructor in a context where the result has to be immediately atomized.
>
> I would be much in favor of this.
>
> The fact that one can write a single XML element in XQuery, but not XML fragments, hunts me every time
> I teach XQuery, and I am honestly tired of this.
>
> Best regards
> Dana
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Michael Kay
2010-07-29 10:00:26 UTC
Permalink
>It was somewhat surprising to me, coming from that background, that
the precedence of "," was so low, and from the other responses, it seems
some others shared (or at least have been made aware of) the same
confusion, although it is, as you say, easily remedied.

The explanation of why the precedence is so low lies in the overloading
of "," to separate arguments in a function call, which was something
where we really had no choice. This meant that we needed the concept of
"ExprSingle" to mean "an expression not containing a top-level comma" to
define what was allowed as a function argument, and this decision led to
"," having lower precedence than any other operator. I remember there
were various suggestions for an alternative symbol for use as a
sequence-concatenation operator, but they all looked ugly and unnatural.
So this isn't a case of something that happened because no-one realised
the consequences, it was a carefully-considered design decision that was
considered less bad than the alternatives on the table.

I do remember that one of the suggestions was that comma as a
concatenation operator should only work in conjunction with parentheses,
so that a SeqConcatExpr would be

"(" (expr ",")* expr ")"

rather than

(expr ",")* expr

in which case constructs like "return a,b" would be errors rather than
being mis-parsed. With hindsight that might have been a better decision,
but it would have carried its own set of surprises; use cases where the
expression appears within curly braces, e.g. in a function body, were
particularly noticeable.

Michael Kay
Saxonica
Martin Probst
2010-07-29 10:09:45 UTC
Permalink
> use cases where the expression appears within curly braces, e.g. in a
> function body, were particularly noticeable.

That is a of course a question of taste, but I think I'd prefer such
an explicit way of sequence construction. Whatever, water under the
bridge...

Martin
Andrew Welch
2010-07-29 10:29:38 UTC
Permalink
On 29 July 2010 11:09, Martin Probst <mail-2udWM7XjvTsOyoagqWviFgC/***@public.gmane.org> wrote:
>> use cases where the expression appears within curly braces, e.g. in a
>> function body, were particularly noticeable.
>
> That is a of course a question of taste, but I think I'd prefer such
> an explicit way of sequence construction. Whatever, water under the
> bridge...

Can't you enforce that as a good practice "style guide" rule?


--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
Andrew Welch
2010-07-28 09:26:06 UTC
Permalink
On 28 July 2010 08:26, Daniela Florescu <dflorescu-***@public.gmane.org> wrote:
>>  but in xquery, a comma can introduce an entirely new statement because a
>> sequence can appear pretty much anywhere and there is no required statement
>> terminator
>
> Mike,
>
> I am not sure what you mean by that.
>
> Which version of XQuery do you refer to ? (Update ? Scripting ?)
>
> In any case, comma never introduces another "statement" like in
> imperative programming languages. At least in none of the XQuery-related
> documents that we have so far.

Here's another example, say you have a function to return to the HTML
<head> contents, in XSLT you could do:

<xsl:function name="f:getHead">
<title>some title</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</xsl:function>

the xquery equivalent is:

declare function f:getHead() {
<title>some title</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
};

...but that causes an error.... its missing a comma:

declare function f:getHead() {
<title>some title</title>,
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
};



--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
Dave Pawson
2010-07-28 10:44:04 UTC
Permalink
On 28 July 2010 10:26, Andrew Welch <andrew.j.welch-***@public.gmane.org> wrote:

> Here's another example, say you have a function to return to the HTML

> the xquery equivalent is:
>
> declare function f:getHead() {
>  <title>some title</title>
>  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
> };
>
> ...but that causes an error.... its missing a comma:
>
> declare function f:getHead() {
>  <title>some title</title>,
>  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
> };
>

That's exactly it Andrew.
I find it hard/odd to determine that I need a sequence (hence the comma).
Or more likely, when I'm generating a sequence.
The other infuriating one is when to drop in those {} pairs.


Face it, xquery is an 'odd' syntax that takes some getting used to,
more so when you step beyond simple tutorial class examples.

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk
Mike Sokolov
2010-07-28 16:10:21 UTC
Permalink
On 07/28/2010 03:26 AM, Daniela Florescu wrote:
>> but in xquery, a comma can introduce an entirely new statement
>> because a sequence can appear pretty much anywhere and there is no
>> required statement terminator
>
> Mike,
>
> I am not sure what you mean by that.
>
Sorry, I was a bit vague. What I meant was that at one time I had some
confusion due to the difference between:

for $x in (1 to 2)
return (<x>{$x}</x>, <y>3</y>)

and

for $x in (1 to 2)
return <x>{$x}</x>, <y>3</y>

My experience has been that the first statement returns:


<x>1</x>
<y>3</y>
<x>2</x>
<y>3</y>

while the second returns:

<x>1</x>
<x>2</x>
<y>3</y>

since it splits the whole expression at the comma and handles the two
halves as separate statements (flwors).
David
2010-07-28 16:24:05 UTC
Permalink
I agree this has confused me and others ...
but objectively is this any more confusing then say C or Java's use of ";"

for( int i = 0 ; i < 10 ; i++ )
printf("%d",i); printf("something");

vs
for( int i = 0 ; i < 10 ; i++ ) {
printf("%d",i); printf("something");
}


Its just something you have to learn, especially with "modern"
languages where NL and other whitespace are treated identically,
you can be fooled by indentation and newlines.



-------------------------
David A. Lee
dlee-***@public.gmane.org
http://www.calldei.com
http://www.xmlsh.org


On 7/28/2010 12:10 PM, Mike Sokolov wrote:
> On 07/28/2010 03:26 AM, Daniela Florescu wrote:
>>> but in xquery, a comma can introduce an entirely new statement
>>> because a sequence can appear pretty much anywhere and there is no
>>> required statement terminator
>>
>> Mike,
>>
>> I am not sure what you mean by that.
>>
> Sorry, I was a bit vague. What I meant was that at one time I had
> some confusion due to the difference between:
>
> for $x in (1 to 2)
> return (<x>{$x}</x>, <y>3</y>)
>
> and
>
> for $x in (1 to 2)
> return <x>{$x}</x>, <y>3</y>
>
> My experience has been that the first statement returns:
>
>
> <x>1</x>
> <y>3</y>
> <x>2</x>
> <y>3</y>
>
> while the second returns:
>
> <x>1</x>
> <x>2</x>
> <y>3</y>
>
> since it splits the whole expression at the comma and handles the two
> halves as separate statements (flwors).
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
Lionel Villard
2010-07-28 16:33:53 UTC
Permalink
Right and proper formatting tools help there.

Lionel

On Jul 28, 2010, at 12:24 PM, David wrote:

> I agree this has confused me and others ...
> but objectively is this any more confusing then say C or Java's use
> of ";"
>
> for( int i = 0 ; i < 10 ; i++ )
> printf("%d",i); printf("something");
>
> vs
> for( int i = 0 ; i < 10 ; i++ ) {
> printf("%d",i); printf("something");
> }
>
>
> Its just something you have to learn, especially with "modern"
> languages where NL and other whitespace are treated identically,
> you can be fooled by indentation and newlines.
>
>
>
> -------------------------
> David A. Lee
> dlee-***@public.gmane.org
> http://www.calldei.com
> http://www.xmlsh.org
>
>
> On 7/28/2010 12:10 PM, Mike Sokolov wrote:
>> On 07/28/2010 03:26 AM, Daniela Florescu wrote:
>>>> but in xquery, a comma can introduce an entirely new statement
>>>> because a sequence can appear pretty much anywhere and there is
>>>> no required statement terminator
>>>
>>> Mike,
>>>
>>> I am not sure what you mean by that.
>>>
>> Sorry, I was a bit vague. What I meant was that at one time I had
>> some confusion due to the difference between:
>>
>> for $x in (1 to 2)
>> return (<x>{$x}</x>, <y>3</y>)
>>
>> and
>>
>> for $x in (1 to 2)
>> return <x>{$x}</x>, <y>3</y>
>>
>> My experience has been that the first statement returns:
>>
>>
>> <x>1</x>
>> <y>3</y>
>> <x>2</x>
>> <y>3</y>
>>
>> while the second returns:
>>
>> <x>1</x>
>> <x>2</x>
>> <y>3</y>
>>
>> since it splits the whole expression at the comma and handles the
>> two halves as separate statements (flwors).
>> _______________________________________________
>> talk-***@public.gmane.org
>> http://x-query.com/mailman/listinfo/talk
> _______________________________________________
> talk-***@public.gmane.org
> http://x-query.com/mailman/listinfo/talk
>
Xavier Franc
2010-09-13 21:11:01 UTC
Permalink
Well, I did not expect so much debate about my dumb test with count()...

Indeed it is not fair to judge a product based only on a such simple tests,
but that can gives hints about the degree of sophistication or naivety
of its
implementation and the amount of effort put into optimizing it.

I agree with Martin that lazy evaluation is quite important,
not only because it gives better results in general, but also
because it allows a better scalability in the case of databases.

In Qizx, we are in a (c) case (in Michael Kay's classification)
the count() function is optimized on some types of sequences.
The Range expression (1 to N) is of course easy to optimize,
but more generally care is taken to optimize count() by using
database indexes when possible.


Anyway I am glad this kind of issues are debated, because I think the
question of speed and efficacy is crucial for the acceptance of XML/XQuery
databases, and we are perhaps no more at a stage where we can
still say that XML databases "will be optimized in the course of time".

--
Xavier Franc
Qizx design and development
Loading...