"the files", "get URL" and Unicode File Names

Discussion:

Igor de Oliveira Couto

2010-02-20 21:48:16 UTC

Dear List Members,

I am trying to write a reasonably simple test script, which would iterate through every file in a chosen directory, and give me the md5 checksum for that file. I am coming across a problem, which *may* be related to Unicode, but I'm not certain. The algorithm is like this:

* user selects directory
* get list of files in directory (using 'the files')
* for every file in the list:
** get the file data stream (using 'get URL "binfile:..."')
** get the checksum

The function that gets the datastream using 'get URL' returns 'empty' in case it cannot find the file. I am finding, that when a file has certain accented or foreign characters in its name, 'get URL' is unable to find it - even though it is using the original unmodified string returned by 'the files'... What is most puzzling, is that this does not happen with *every* file that has an international character in it - files with french, spanish, german or even scandinavian characters fly through without a hitch. But if I have a file that has a "ĉ" (c+circumflex) or "ŭ" (u+breve) in its name, for instance, it chokes.

Am I doing something wrong, or missing something basic? - or did I hit a limitation, or bug?

Many thanks for any guidance,

--
Igor de Oliveira Couto
Sydney, Australia

PS - the code in full, for those interested:

1) Make a new Mainstack, and add a field named "folderContents", and a button.
2) Put the following into the button's script:

on mouseUp
answer folder "Please select a folder:"
if it is empty then exit mouseUp
local tDefault, tItems
put the defaultFolder into tDefault
set the defaultFolder to it
put empty into field "folderContents"
put the files into tItems
repeat for each line xLine in tItems
put "name=" & quote & xLine & quote after field "folderContents"
put " checksum=" & quote & fileDigest(the defaultFolder & "/" & xLine) & quote & return after field "folderContents"
end repeat
filter field "folderContents" without empty
set the defaultFolder to tDefault
end mouseUp

function hexDigest pvalue
local tRes, tMD5
put md5Digest(pValue) into tMD5
get binaryDecode("H*",tMD5,tRes)
return tRes
end hexDigest

function fileDigest pFile
if there is a file pFile then
get URL ("binfile:" & pFile)
return hexDigest(it)
else
return empty
end if
end fileDigest

stephen barncard

2010-02-21 05:25:12 UTC

Permalink

Igor, why don't you try and use "the detailed files", URLDecode it and parse
out the filenames? Perhaps URLDecoding can preserve those characters.

just a thought. Not tested.
-------------------------
Stephen Barncard
San Francisco
http://houseofcubes.com/disco.irev

Post by Igor de Oliveira Couto
Dear List Members,
I am trying to write a reasonably simple test script, which would iterate
through every file in a chosen directory, and give me the md5 checksum for
that file. I am coming across a problem, which *may* be related to Unicode,
* user selects directory
* get list of files in directory (using 'the files')
** get the file data stream (using 'get URL "binfile:..."')
** get the checksum
The function that gets the datastream using 'get URL' returns 'empty' in
case it cannot find the file. I am finding, that when a file has certain
accented or foreign characters in its name, 'get URL' is unable to find it -
even though it is using the original unmodified string returned by 'the
files'... What is most puzzling, is that this does not happen with *every*
file that has an international character in it - files with french, spanish,
german or even scandinavian characters fly through without a hitch. But if I
have a file that has a "ĉ" (c+circumflex) or "ŭ" (u+breve) in its name, for
instance, it chokes.
Am I doing something wrong, or missing something basic? - or did I hit a
limitation, or bug?
Many thanks for any guidance,
--
Igor de Oliveira Couto
Sydney, Australia
1) Make a new Mainstack, and add a field named "folderContents", and a button.
on mouseUp
answer folder "Please select a folder:"
if it is empty then exit mouseUp
local tDefault, tItems
put the defaultFolder into tDefault
set the defaultFolder to it
put empty into field "folderContents"
put the files into tItems
repeat for each line xLine in tItems
put "name=" & quote & xLine & quote after field "folderContents"
put " checksum=" & quote & fileDigest(the defaultFolder & "/" & xLine)
& quote & return after field "folderContents"
end repeat
filter field "folderContents" without empty
set the defaultFolder to tDefault
end mouseUp
function hexDigest pvalue
local tRes, tMD5
put md5Digest(pValue) into tMD5
get binaryDecode("H*",tMD5,tRes)
return tRes
end hexDigest
function fileDigest pFile
if there is a file pFile then
get URL ("binfile:" & pFile)
return hexDigest(it)
else
return empty
end if
end fileDigest_______________________________________________
use-revolution mailing list
Please visit this url to subscribe, unsubscribe and manage your
http://lists.runrev.com/mailman/listinfo/use-revolution

Igor de Oliveira Couto

2010-02-21 05:32:14 UTC

Permalink

Dear Stephen,

Post by stephen barncard
Igor, why don't you try and use "the detailed files", URLDecode it and parse
out the filenames? Perhaps URLDecoding can preserve those characters.
just a thought. Not tested.

I tried it, and it doesn't work. When the file name is a "ĉ" (c+circumflex), for instance, it encodes it as 2 separate characters: a "c" and a "^" circumflex. When I URLdecode it, therefore, I end up with a "c^". The name of the file is different, and Revolution's functions (get url, there is a file, etc.) cannot find the file...

Please do send any more suggestions, though. I really want this to work, and I'll try anything!!! :)

Kind Regards,

--
Igor de Oliveira Couto
Sydney, Australia

Bernard Devlin

2010-02-21 11:02:57 UTC

Permalink

On Sun, Feb 21, 2010 at 5:32 AM, Igor de Oliveira Couto

Please do send any more suggestions, though. I really want this to work, and I'll try anything!!! :)

Igor, I tried with Vista and there's a bit of good news and a lot of
bad news. The good news is it doesn't choke on a file with a 'ĉ' it
the name, but it does replace it with a 'c' when getting 'the files'.
Subsequently Rev is unable to find the file in "there is a file" when
it has substituted the character, so the checksum fails.

In the message box I tried "there is a file" using a file with a ĉ in
the name. And not only did Rev answer false when there was such a
file, the message box replaced the ĉ in the input field and put either
several spaces or a tab in place of the ĉ.

Thinking that bugs like this surely would have been seen by our French
revolutionaries I switched my input to French and restarted Rev, but
still the problem persisted. I set the useUnicode to true in all your
handlers (the only property that seems vaguely relevant), but still no
go.

Thinking that the issues I saw with ĉ in the message box might be a
message-box-only issue, I tried to just get the file from within a
button. But in my testing, I could not even paste a ĉ into the script
editor.

Finally I checked in RQCC. Unfortunately there seem to be relevant
bug reports dating back almost 2 years:

http://quality.runrev.com/qacenter/show_bug.cgi?id=4897
http://quality.runrev.com/qacenter/show_bug.cgi?id=7019

There are approx 30% more bugs now than there were 12 months ago.
Everyone's very excited about revMobile, but I see no reason to think
that in 12 months time the bug count won't be even higher.

It looks like you don't even have the option to parse the output of
shell() (and 'dir' or 'ls' depending on the platform), since the file
operations will fail even if you get the correct file name.

I'm afraid I'm out of suggestions. Maybe someone else can help. I
must be missing something - anyone providing Rev apps to the French or
German markets must have had problems with users having file names
with circumflex accents or umlauts.

Bernard

Igor de Oliveira Couto

2010-02-21 22:08:09 UTC

Permalink

Post by Bernard Devlin
Finally I checked in RQCC. Unfortunately there seem to be relevant
Hopefully Rev updates these routines to use the unicode system calls (on Mac and Windows, not certain about Linux)..

I checked the bugs system, too, and there seem to be quite a few unicode-related reports - some dating back a lot longer than 2 years. It is disappointing to see, to say the least. For a product that is trying to gain marketplace on the web, lack of total support for international writing systems will be a deal-breaker for many people.

I'm even more surprised that no one has come up with a cross-platform workaround - external - that can do the job. Does that mean that every application that is made with Revolution is unable to use international characters when saving files to the local system? I was certain I had read somewhere that there were users of Rev in Japan - are they all using standard ASCII English?

Dumbfounded,

--
Igor de Oliveira Couto
Sydney, Australia

Jim Bufalini

2010-02-21 22:36:25 UTC

Permalink

Post by Igor de Oliveira Couto
I checked the bugs system, too, and there seem to be quite a few
unicode-related reports - some dating back a lot longer than 2 years.
It is disappointing to see, to say the least. For a product that is
trying to gain marketplace on the web, lack of total support for
international writing systems will be a deal-breaker for many people.
I'm even more surprised that no one has come up with a cross-platform
workaround - external - that can do the job. Does that mean that every
application that is made with Revolution is unable to use international
characters when saving files to the local system? I was certain I had
read somewhere that there were users of Rev in Japan - are they all
using standard ASCII English?
Dumbfounded,

I "believe" RunRev is very aware of the Unicode issues and is actively
working to resolve all of them "soon."

Aloha from Hawaii,

Jim Bufalini

Kenji Kojima

2010-02-21 22:53:32 UTC

Permalink

Post by Jim Bufalini
I "believe" RunRev is very aware of the Unicode issues and is actively
working to resolve all of them "soon."
Aloha from Hawaii,
Jim Bufalini

I had heard "soon" since Rev was V2.
I have to buy a new English dictionary.

--
Kenji Kojima
http://www.kenjikojima.com/999ViewsRenga/

Kenji Kojima

2010-02-22 01:55:23 UTC

Permalink

I've been waiting only three Unicode bugs fixing for long long long while.

1) Unicode file name
2) Unicode file path.
3) Unicode menu.

After fixing them we can make a true Japanese application.

--
Kenji Kojima
http://www.kenjikojima.com/999ViewsRenga/

Igor de Oliveira Couto

2010-02-21 22:54:14 UTC

Permalink

Dear Jim,

Post by Jim Bufalini
I "believe" RunRev is very aware of the Unicode issues and is actively
working to resolve all of them "soon."

Revolution is a fantastic platform for cross-platform development, and I hope this core issue might be addressed soon. As international text is used throughout all apps I work on, this would simply create far too many issues - from interface design, database storage, to file manipulation. This is the kind of 'missing core feature' that makes people change frameworks - and that is a great shame, because Rev does have so much to offer in other areas.

I hear that a new version (4.5) is in the works. If this does not break any confidentiality agreements, can a beta tester tell us perhaps if any unicode issues are being addressed in this new version?

Many thanks in advance,

--
Igor de Oliveira Couto
Sydney, Australia

J. Landman Gay

2010-02-21 23:50:51 UTC

Permalink

Post by Igor de Oliveira Couto
I hear that a new version (4.5) is in the works. If this does not
break any confidentiality agreements, can a beta tester tell us
perhaps if any unicode issues are being addressed in this new
version?

The road plan is confidential so we can't really say here. But rest
assurred that unicode is a big part of future development. RR knows they
need it to be internationally accepted. And I believe Mark Waddingham
has now finished reading the 6000 pages of the unicode specs. :)

--
Jacqueline Landman Gay | ***@hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com

Jeffrey Massung

2010-02-22 00:14:30 UTC

Permalink

The road plan is confidential so we can't really say here. But rest assurred that unicode is a big part of future development. RR knows they need it to be internationally accepted. And I believe Mark Waddingham has now finished reading the 6000 pages of the unicode specs. :)

While the road plan may be confidential, after my evaluation period was up and I was considering a purchase, I emailed RR with a list of my concerns and asked what was in the pipe to address them. They were quite prompt with a reply. I learned that some of my problems/concerns were already handled (the reply even had sample code to give examples), some were being actively addressed - and I was told how - and some were just given lip-service so likely weren't on the road map.

I'd take some time, email RR, tell them your business, why certain features may be important to you, and ask them to be frank about where those features are on their roadmap (if they are at all). I think you'll be pleasantly surprised at the professionalism in the response you get.

My 2 cents.

Jeff M.

Shao Sean

2010-02-21 13:05:07 UTC

Permalink

When adding some features in to my Mac external to handle file paths,
I had to use older non-unicode system calls so Rev could handle the
file paths.. Hopefully Rev updates these routines to use the unicode
system calls (on Mac and Windows, not certain about Linux)..

Shao Sean

2010-02-22 02:04:50 UTC

Permalink

I can look into an external work-around while we await Rev to fix this
at the engine level if anyone is brave enough to do some testing..
(email me off-list)