[erlang-questions] json to map

Discussion:

Roelof Wobben

2015-08-25 12:16:03 UTC

Hello,

As a challenge I need to convert a json file to a erlang map.

So I have this file :

{
"foo": {
"id": 1,
"username": "Foo Foo",
"first": "Foo",
"last": "Foo",
"password": "foo",
"email": "***@hapiu.com",
"scope": ["admin", "user"]
},
"bar": {
"id": 2,
"username": "Bar Head",
"first": "Bar",
"last": "Head",
"password": "bar",
"email": "***@hapiuni.com",
"scope": ["user"]
}
}

as far as I can see it's a tuple of tuples.
What are the steps to convert it.

I do not need the code just some steps which point me to the right
direction.

Roelof

---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus

Bengt Kleberg

2015-08-25 12:21:24 UTC

Permalink

Greetings,

If you replace all
:
with
,

and add a final
.

you can use file:consult/1 to read the file directly.

bengt

Post by Roelof Wobben
Hello,
As a challenge I need to convert a json file to a erlang map.
{
"foo": {
"id": 1,
"username": "Foo Foo",
"first": "Foo",
"last": "Foo",
"password": "foo",
"scope": ["admin", "user"]
},
"bar": {
"id": 2,
"username": "Bar Head",
"first": "Bar",
"last": "Head",
"password": "bar",
"scope": ["user"]
}
}
as far as I can see it's a tuple of tuples.
What are the steps to convert it.
I do not need the code just some steps which point me to the right
direction.
Roelof
---
Dit e-mailbericht is gecontroleerd op virussen met Avast
antivirussoftware.
https://www.avast.com/antivirus
_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

Kenneth Lakin

2015-08-25 12:26:27 UTC

Permalink

Post by Bengt Kleberg
Greetings,
If you replace all
with
,
and add a final
.

That's an *excellent* suggestion! I was unaware of file:consult/1.

I bet that one could play with the replacement character to make map
conversion even easier.

Garrett Smith

2015-08-25 12:45:59 UTC

Permalink

Doesn't seem very challenging :)

But that's a valid point - a good programmer cheats!

Another cheat would be to use a JSON parsing library. Again not
particularly challenging, but it does send you down the road of "find
an existing solution to a problem, rather than roll your own".

For serious fun, write a parser! You could roll your own, or again
cheat by using Erlang's pretty darn good yecc module:

http://erlang.org/doc/man/yecc.html

On Tue, Aug 25, 2015 at 7:21 AM, Bengt Kleberg

Post by Bengt Kleberg
Greetings,
If you replace all
with
,
and add a final
.
you can use file:consult/1 to read the file directly.
bengt

_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

Roelof Wobben

2015-08-25 14:49:38 UTC

Permalink

Post by Garrett Smith
Doesn't seem very challenging :)
But that's a valid point - a good programmer cheats!
Another cheat would be to use a JSON parsing library. Again not
particularly challenging, but it does send you down the road of "find
an existing solution to a problem, rather than roll your own".
For serious fun, write a parser! You could roll your own, or again
http://erlang.org/doc/man/yecc.html

Hello Gareth,

That seems to be a good idea to keep me busy for some days.
I will read a lot about that topic and hopefully I can make this work.

Roelof

---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus

Tony Rogvall

2015-08-25 13:21:05 UTC

Permalink

Or even

replace:
{ with #{
: with =>

and add the final dot.

:-)

/Tony

Post by Bengt Kleberg
Greetings,
If you replace all
with
,
and add a final
.
you can use file:consult/1 to read the file directly.
bengt

Post by Roelof Wobben
Hello,
As a challenge I need to convert a json file to a erlang map.
{
"foo": {
"id": 1,
"username": "Foo Foo",
"first": "Foo",
"last": "Foo",
"password": "foo",
"scope": ["admin", "user"]
},
"bar": {
"id": 2,
"username": "Bar Head",
"first": "Bar",
"last": "Head",
"password": "bar",
"scope": ["user"]
}
}
as far as I can see it's a tuple of tuples.
What are the steps to convert it.
I do not need the code just some steps which point me to the right direction.
Roelof
---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus
_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

Kenneth Lakin

2015-08-25 12:40:15 UTC

Permalink

Post by Roelof Wobben
I do not need the code just some steps which point me to the right
direction.

Also, if you end up trying to convert a string containing JSON -rather
than a file-, the results of the Google search for "erlang string to
terms" turns up an interesting erlang-questions thread from way back.

Richard A. O'Keefe

2015-08-26 05:38:15 UTC

Permalink

Post by Roelof Wobben
Hello,
As a challenge I need to convert a json file to a erlang map.

You need a more precise specification.
A map with an arbitrary key whose associated value
was a binary containing the text of the JSON term
would technically satisfy this.

Post by Roelof Wobben
{
"foo": {
"id": 1,
"username": "Foo Foo",
"first": "Foo",
"last": "Foo",
"password": "foo",
"scope": ["admin", "user"]
},
"bar": {
"id": 2,
"username": "Bar Head",
"first": "Bar",
"last": "Head",
"password": "bar",
"scope": ["user"]
}
}
as far as I can see it's a tuple of tuples.

No, it's not a tuple of tuples. To start with,
it's >>JSON<<, not Erlang. JSON has arrays, not
tuples.

Here's a table that may help

JSON Erlang
null null (the atom)
false false
true true
a number the same number
a string a binary with the UTF-8 encoding
an array ([]) a list
an object ({}) a map

Again, you don't say *how* the conversion is to be done.
You could probably whip something up in a few minutes
using lex to make a C program that basically changed
opening " to <<", closing " to ">>, and did the right
things for { } and : . That would convert JSON syntax
to Erlang syntax quite quickly.

Probably what you meant was
WEB-SEARCH(JSON parser for Erlang using maps)
This might lead you to https://github.com/talentdeficit/jsx
or to one of the other JSON libraries with which Erlang is
so liberally equipped.

Roelof Wobben

2015-08-26 05:56:29 UTC

Permalink

Post by Richard A. O'Keefe

Post by Roelof Wobben
Hello,
As a challenge I need to convert a json file to a erlang map.

You need a more precise specification.
A map with an arbitrary key whose associated value
was a binary containing the text of the JSON term
would technically satisfy this.

the exact text of the challenge is here :

Configuration files can be conveniently represented as JSON terms. Write
some functions to read configuration files containing JSON terms and
turn them into Erlang maps. Write some code to perform sanity checks
on the data in the configuration files.

---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus

Richard A. O'Keefe

2015-08-27 02:41:04 UTC

Permalink

Post by Roelof Wobben
Configuration files can be conveniently represented as JSON terms.

Yuck. This has "representation" backwards.
Here's what we have, using [thing] for "things"
and (action) for "processes".

In our heads In our processes In our file system
[abstract [stored
configuration -(implement) -> configuration
value] data]
| |
(conversion) (programmed conversion)
| |
v |
[abstract v
JSON value] -(implement) -> [stored JSON value]
| |
v |
(unparse) (programmed unparsing)
| |
[abstract token v
sequence] -(implement) -> [stored token sequence]
| |
(layout + unlex) (programmed layout + unlex)
| |
v |
[abstract character v
sequence] -(implement) -> [stored character sequence]
| |
(Unicode encoding) (programmed encoding)
| |
v |
[abstract byte v
sequence] -(implement) -> [stored byte sequence]
| |
(compress, (programmed compression.
encrypt, encryption, signing,
sign, &c) and so on)
| |
v v
[another abstract [another stored
byte sequence] byte sequence] ---(store)---> [FILE]
There is an ABSTRACT space of JSON terms.

Each of the arrows (down and right) is a "representation" arrow.
The thing at the tip of the arrow represents the thing at the
base of the arrow. The file we end up with is the thing that
does the representing (GIVEN this framework), and the
configuration data is what is represented.

Don't take "stored" too literally. A "stored" data in the
middle column could be a data structure or a communication
pattern. I just mean that it's "inside the computer" in
the sense that it is directly accessible to code.

This diagram must commute, that is, whatever path you take
through the arrows, you must end up with *equivalent* things.

Not equal.

Converting configuration values to JSON values need not be
unique. For example, a set of n elements might be converted
to a JSON array without duplicates in n! ways. But we can
arrange to treat permuted arrays in certain contexts as
equivalent.

Converting JSON values to token sequences is not unique.
For example, a JSON object doesn't *have* any order to it,
but for unparsing, you have to pick an order. Given an
object with n pairs, there are n! ways to order them.
We can arrange to treat those as equivalent.

Unlexing, converting tokens to character sequences, is not
unique. 1, 1e0, 10e-1, 1.0e1, &c are the same, so even
without allowing leading zeros there are hundreds of
ways (but not infinitely many ways) to represent a number
token. Most unicode characters can be represented in two
ways (/ can be represented in three), so a string of n
characters can be unlexed in at least 2**n ways. (It's
worse than that because \u002f and \u002F are equivalent,
so / has four alternatives.)
Layout can insert arbitrary amounts of white space between tokens,
and there are infinitely many ways to do that.

There are multiple definitions of JSON. ECMA 404 stops at
the level of Unicode character sequences, and has nothing
to say about encoding. There are LOTS of encodings.

There are also many compression, encryption, and digital
signature algorithms, which be freely composed.

JSON qua JSON has nothing to say about how files are encoded
or whether they are compressed, encrypted, or signed. But
to put text into a file, you have to encode it somehow, and
you have to make some decision about other matters. (And
don't get me onto file systems with fixed length records,
where you have to figure out how to fit a 1 million character
string into 128 byte records...

Post by Roelof Wobben
Write
some functions to read configuration files containing JSON terms and
turn them into Erlang maps.

What if a configuration file represents this JSON term:
[["target","some program"],
["source","some other program"],
["date",[2015,08,27,14,05]],
["gibberish",[3,1,4,1,5,9,2,7]]]

How are you supposed to convert *that* to an Erlang map?
In any way that makes sense?
Oh, I know:

{"": <<"[[\"target...7,]]]>>}

or whatever the syntax for maps is.
It technically satisfies the requirements!

The first thing to do with these exercises is CRITICISE them.
I do not mean to sneer at them and throw them away, but to
start from a presupposition that the language is muddled,
the contents confused, and the requirements either incomplete
or inconsistent. (Like practically *every* requirement we
start with including some published standards. I'm looking
at you, ECMA 404!)

I am not kidding. You have to start out by trying to
understand the requirements, EXPECTING to find problems,
RESOLVING them, and writing down REVISED requirements
that spell out everything you actually need to know.

For example, you might include the following:

- Only the UTF-8 encoding is to be supported.
- No compression, encryption, or signing are to be supported.
- You may assume that the file system treats a file as
an arbitrary sequence of bytes with no record boundaries.
- You are to convert null, false, true to the Erlang atoms
'null, 'false', 'true'.
- You are to convert JSON numbers to Erlang floats.
- You are to convert JSON strings to Erlang binaries.
- You are to convert JSON arrays to Erlang lists;
nothing else is to be converted to a list.
- You are to convert JSON objects to Erlang maps;
nothing else is to be converted to a map.
- You are not to worry about inverting the conversion
from configuration data to JSON terms; there is no
configuration data, that was just put in to make it
interesting.

Post by Roelof Wobben
Write some code to perform sanity checks
on the data in the configuration files.

Here is another piece of confusion/incompleteness, or
possibly even questionable advice.

This presupposes some procedure where you FIRST convert
a JSON text stored in a file to some Erlang term and
THEN you check the sanity. Or at least, it seems to.

Another approach is to check as you go so that there is
never any insane Erlang data at all.

This is highly topical, because we've recently seen a
bunch of serious Android security bugs caused by
overly trusting object deserialisation which allowed
objects to be constructed violating their invariants.
In fact this has triggered a burst of work on my
Smalltalk system, because I had a great big OOPS:
oh dear, I have the same problem. So I'm now slogging
through nearly a thousand files turning comments
about invariants into executable code and writing
invariants for the *shameful* number of classes that
had none, so that the deserialisation code can call
each newly reconstructed object's #invariant method
before trusting it.

So I strongly recommend validating data as you parse
it, and if a sanity check is failed, crash immediately.
This leaves nothing for subsequent sanity checks to do.

UNLESS you have configuration data that's converted to
JSON terms in such a way that not all terms represent
valid configuration data. But from what you quote,
you haven't been given anything for sanity checks like
that to DO.

All things considerd, the exercise appears to be a
cryptic way of saying "WRITE A JSON PARSER".

For what it's worth, my JSON parser in Smalltalk is
117 lines for a tokeniser + 45 lines for a parser.
Being stricter about the input would let me shave
about 20 lines off the total.

Much of the trickiness is in handling strings,
where JSON requires that a character outside the
Basic Multilingual plane must be encoded as a
surrogate pair.

Processing a sequence of characters as an Erlang
string will probably make your life simpler; and
processing a sequence of tokens as an Erlang list
will also be likely to make your life simpler.

Roelof Wobben

2015-08-27 11:04:22 UTC

Permalink

Thanks,

Can this be a way to solve the challenge :
http://www.evanmiller.org/write-a-template-compiler-for-erlang.html

Roelof

Post by Richard A. O'Keefe

Post by Roelof Wobben
Configuration files can be conveniently represented as JSON terms.

Yuck. This has "representation" backwards.
Here's what we have, using [thing] for "things"
and (action) for "processes".
In our heads In our processes In our file system
[abstract [stored
configuration -(implement) -> configuration
value] data]
| |
(conversion) (programmed conversion)
| |
v |
[abstract v
JSON value] -(implement) -> [stored JSON value]
| |
v |
(unparse) (programmed unparsing)
| |
[abstract token v
sequence] -(implement) -> [stored token sequence]
| |
(layout + unlex) (programmed layout + unlex)
| |
v |
[abstract character v
sequence] -(implement) -> [stored character sequence]
| |
(Unicode encoding) (programmed encoding)
| |
v |
[abstract byte v
sequence] -(implement) -> [stored byte sequence]
| |
(compress, (programmed compression.
encrypt, encryption, signing,
sign, &c) and so on)
| |
v v
[another abstract [another stored
byte sequence] byte sequence] ---(store)---> [FILE]
There is an ABSTRACT space of JSON terms.
Each of the arrows (down and right) is a "representation" arrow.
The thing at the tip of the arrow represents the thing at the
base of the arrow. The file we end up with is the thing that
does the representing (GIVEN this framework), and the
configuration data is what is represented.
Don't take "stored" too literally. A "stored" data in the
middle column could be a data structure or a communication
pattern. I just mean that it's "inside the computer" in
the sense that it is directly accessible to code.
This diagram must commute, that is, whatever path you take
through the arrows, you must end up with *equivalent* things.
Not equal.
Converting configuration values to JSON values need not be
unique. For example, a set of n elements might be converted
to a JSON array without duplicates in n! ways. But we can
arrange to treat permuted arrays in certain contexts as
equivalent.
Converting JSON values to token sequences is not unique.
For example, a JSON object doesn't *have* any order to it,
but for unparsing, you have to pick an order. Given an
object with n pairs, there are n! ways to order them.
We can arrange to treat those as equivalent.
Unlexing, converting tokens to character sequences, is not
unique. 1, 1e0, 10e-1, 1.0e1, &c are the same, so even
without allowing leading zeros there are hundreds of
ways (but not infinitely many ways) to represent a number
token. Most unicode characters can be represented in two
ways (/ can be represented in three), so a string of n
characters can be unlexed in at least 2**n ways. (It's
worse than that because \u002f and \u002F are equivalent,
so / has four alternatives.)
Layout can insert arbitrary amounts of white space between tokens,
and there are infinitely many ways to do that.
There are multiple definitions of JSON. ECMA 404 stops at
the level of Unicode character sequences, and has nothing
to say about encoding. There are LOTS of encodings.
There are also many compression, encryption, and digital
signature algorithms, which be freely composed.
JSON qua JSON has nothing to say about how files are encoded
or whether they are compressed, encrypted, or signed. But
to put text into a file, you have to encode it somehow, and
you have to make some decision about other matters. (And
don't get me onto file systems with fixed length records,
where you have to figure out how to fit a 1 million character
string into 128 byte records...

Post by Roelof Wobben
Write
some functions to read configuration files containing JSON terms and
turn them into Erlang maps.

[["target","some program"],
["source","some other program"],
["date",[2015,08,27,14,05]],
["gibberish",[3,1,4,1,5,9,2,7]]]
How are you supposed to convert *that* to an Erlang map?
In any way that makes sense?
{"": <<"[[\"target...7,]]]>>}
or whatever the syntax for maps is.
It technically satisfies the requirements!
The first thing to do with these exercises is CRITICISE them.
I do not mean to sneer at them and throw them away, but to
start from a presupposition that the language is muddled,
the contents confused, and the requirements either incomplete
or inconsistent. (Like practically *every* requirement we
start with including some published standards. I'm looking
at you, ECMA 404!)
I am not kidding. You have to start out by trying to
understand the requirements, EXPECTING to find problems,
RESOLVING them, and writing down REVISED requirements
that spell out everything you actually need to know.
- Only the UTF-8 encoding is to be supported.
- No compression, encryption, or signing are to be supported.
- You may assume that the file system treats a file as
an arbitrary sequence of bytes with no record boundaries.
- You are to convert null, false, true to the Erlang atoms
'null, 'false', 'true'.
- You are to convert JSON numbers to Erlang floats.
- You are to convert JSON strings to Erlang binaries.
- You are to convert JSON arrays to Erlang lists;
nothing else is to be converted to a list.
- You are to convert JSON objects to Erlang maps;
nothing else is to be converted to a map.
- You are not to worry about inverting the conversion
from configuration data to JSON terms; there is no
configuration data, that was just put in to make it
interesting.

Post by Roelof Wobben
Write some code to perform sanity checks
on the data in the configuration files.

Here is another piece of confusion/incompleteness, or
possibly even questionable advice.
This presupposes some procedure where you FIRST convert
a JSON text stored in a file to some Erlang term and
THEN you check the sanity. Or at least, it seems to.
Another approach is to check as you go so that there is
never any insane Erlang data at all.
This is highly topical, because we've recently seen a
bunch of serious Android security bugs caused by
overly trusting object deserialisation which allowed
objects to be constructed violating their invariants.
In fact this has triggered a burst of work on my
oh dear, I have the same problem. So I'm now slogging
through nearly a thousand files turning comments
about invariants into executable code and writing
invariants for the *shameful* number of classes that
had none, so that the deserialisation code can call
each newly reconstructed object's #invariant method
before trusting it.
So I strongly recommend validating data as you parse
it, and if a sanity check is failed, crash immediately.
This leaves nothing for subsequent sanity checks to do.
UNLESS you have configuration data that's converted to
JSON terms in such a way that not all terms represent
valid configuration data. But from what you quote,
you haven't been given anything for sanity checks like
that to DO.
All things considerd, the exercise appears to be a
cryptic way of saying "WRITE A JSON PARSER".
For what it's worth, my JSON parser in Smalltalk is
117 lines for a tokeniser + 45 lines for a parser.
Being stricter about the input would let me shave
about 20 lines off the total.
Much of the trickiness is in handling strings,
where JSON requires that a character outside the
Basic Multilingual plane must be encoded as a
surrogate pair.
Processing a sequence of characters as an Erlang
string will probably make your life simpler; and
processing a sequence of tokens as an Erlang list
will also be likely to make your life simpler.

---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus

Garrett Smith

2015-08-27 14:01:13 UTC

Permalink

Post by Roelof Wobben
Thanks,
http://www.evanmiller.org/write-a-template-compiler-for-erlang.html

I think the challenge is to take smaller steps here...

Either solve this challenge by finding a JSON parser and using it, or
redefine the challenge to "create a simple parser".

Parsing JSON is a challenge, but I think you should define an interim
challenge first - one that's less challenging.

Strictly speaking the challenge of parsing JSON in Erlang (and
therefore being able to generate other forms) is not that challenging
as there are like half a dozen decent libraries to do that already. It
can be a challenge to find them, compile and use them - but not that
challenging. So why not just do that and cross this challenge off your
list?

Then, create a new challenge, which is to parse some simply expression
like "1 + 1" into a term in Erlang. You can then do something
interesting with that term! See hint [1]

This is a good challenge IMO because you'll learn a lot about Erlang
(your goal) and you'll also experience the empowerment of building
your own language, even if a very simple one.

What I'm really trying to say is that it's challenging to define the

Post by Roelof Wobben
The first thing to do with these exercises is CRITICISE them.

But don't criticize mine :)

[1] https://gist.github.com/cooldaemon/13773/6133f606a809dbd05683f290afaac21fbe7e2ce4

Felix Gallo

2015-08-27 16:05:06 UTC

Permalink

Roelof: don't do the JSON parser. Do Garrett's suggestion instead. I
recommend the added facet of trying to build a simple, complete
four-function calculator. For extra credit, add the ability to use
parentheses, which gets you to some fundamental computer science
discoveries.

ROK is, as always, correct that this area (parsing ill-defined
specifications involving text) is one of the Great Problems of Computer
Science, rife with monsters named lex and yacc and their cousins yecc and
leex, in the lair of the dread throne of the Basic Multilingual Plane where
surrogate pairs lurk in the deep pits of context-free grammar, gnawing on
abstract syntax trees.

Nevertheless, the admonition to criticise the exercise doesn't really help
all the Roelofs, because they haven't spent countless years in the trenches
yet, so have no basis to independently formulate the notion that a
maximalist reading of the problem would necessitate a lawyerly reading of
iso/iec 10646, or the implementation of an LR(1) parser, or whatever. The
analysis and subsequent constructive exercise-sculpting does need to come
from a grizzled, irascible, greying mentor; in my opinion.

F.

Post by Garrett Smith

Post by Roelof Wobben
Thanks,
http://www.evanmiller.org/write-a-template-compiler-for-erlang.html

I think the challenge is to take smaller steps here...
Either solve this challenge by finding a JSON parser and using it, or
redefine the challenge to "create a simple parser".
Parsing JSON is a challenge, but I think you should define an interim
challenge first - one that's less challenging.
Strictly speaking the challenge of parsing JSON in Erlang (and
therefore being able to generate other forms) is not that challenging
as there are like half a dozen decent libraries to do that already. It
can be a challenge to find them, compile and use them - but not that
challenging. So why not just do that and cross this challenge off your
list?
Then, create a new challenge, which is to parse some simply expression
like "1 + 1" into a term in Erlang. You can then do something
interesting with that term! See hint [1]
This is a good challenge IMO because you'll learn a lot about Erlang
(your goal) and you'll also experience the empowerment of building
your own language, even if a very simple one.
What I'm really trying to say is that it's challenging to define the

Post by Roelof Wobben
The first thing to do with these exercises is CRITICISE them.

But don't criticize mine :)
[1]
https://gist.github.com/cooldaemon/13773/6133f606a809dbd05683f290afaac21fbe7e2ce4
_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

Richard A. O'Keefe

2015-08-28 05:21:57 UTC

Permalink

This post might be inappropriate. Click to display it.

Roelof Wobben

2015-08-28 07:45:18 UTC

Permalink

I will take the challenge but im stuck at the types part.

so far I have this :

-module(time_parser).

-export([]).

-type token :: tInt()
| tWord()
| tSlash()
| tDash()
| tComma().

-type tint() :: integer().
-type tword() :: binary().
-type tSlash() :: binary().
-type tDash() :: binary().
-type tComma() :: binary().

and I see this error : time_parser.erl:5: bad type declaration

Roelof

ime_parser.erl:5: bad type declaration

Post by Richard A. O'Keefe

Post by Roelof Wobben
Thanks,
Can this be a way to solve the challenge : http://www.evanmiller.org/write-a-template-compiler-for-erlang.html

• Erlang is hard to refactor
I don't find manual refactoring harder in Erlang than in
any other language (not excluding Java and Smalltalk).
I haven't tried Wrangler or RefactorErl (see
http://plc.inf.elte.hu/erlang/) yet, but they look good.
• There is no built-in syntax for hash maps
This is no longer true.
• String manipulation is hard
That's a puzzler. I've found string manipulation using
lists *easier* in Erlang than in almost anything but SNOBOL
or Prolog. I would certainly ***MUCH*** rather write a
string -> JSON parser in Erlang than in say Java or even
Lisp. (Of course the Bigloo implementation of Scheme has
special support for lexers and parsers built in, which does
change the picture.)
The question is always "compared with WHAT?" In many case
the key trick for manipulating strings is DON'T. My JSON
parser in Smalltalk, for example, is only concerned with
strings to the extent that they are a nasty problem posed
by JSON that it has to solve; they are not something that
it uses for its own purposes. The tokeniser converts a
stream of characters to a stream of tokens, and the parser
works with tokens, not characters. (Yes, I know about
scannerless parsers, but the factoring has always helped me
to get a parser working. A separate tokeniser is something
that I can *TEST* without having to have the rest of the
parser working.)
Then it turns out that the web page is really about writing
a compiler from "Django Template Language" to Erlang.
"It helps to get a hold of a language specification if there
is one. I am implementing the Django Template Language. There's
not really a spec, but there is an official implementation in Python,"
OUCH! What *IS* it about this industry? Why do we get notations
that become popular where there is no spec (like Markdown,
originally, or JSON, ditto -- it had syntax but no semantics)
or the spec is confused (like XML, where they muddled up
syntax and semantics so that we ended up with several different
semantics for XML, or the first version of RDF, where they
meant to define it in terms of XML semantics, but there wasn't
really one, so they defined it in terms of XML syntax *by mistake*).
That page talks about writing a scanner with an argument to
say what the state is. This is almost always a bad idea.
Each state should be modelled by a separate Erlang function.
Let's see an example of this.
dd/mm/yyyy
dd MON yyyy
MON dd[,] yyyy
yyyy-mm-dd
(By the way, we give matching and cleaning up data that's just
a little bit more complex than this as an exercise to 3rd year
students. Thinking in Java makes it *impossible* for them to
get something like this right in a 2-hour lab session.
Regular expressions are a royal road to ruin.)
I'll do this in Haskell.
data Token
= TInt Int
| TWord String
| TSlash
| TDash
| TComma
tokens :: [Char] -> [Token]
tokens [] = []
tokens (c:cs) | isSpace c = tokens cs
tokens (c:cs) | isDigit c = digits cs (ord c - ord '0')
tokens (c:cs) | isAlpha c = word cs [c]
tokens ('/':cs) = TSlash : tokens cs
tokens ('-':cs) = TDash : tokens cs
tokens (',':cs) = TComma : tokens cs
-- anything else will crash
digits (c:cs) n | isDigit c = digits cs (ord c - ord '0' + n*10) : digits cs
digits cs n = TInt : tokens cs
word (c:cs) w | isAlpha c = word cs (toLower c : w)
word cs w = TWord (reverse w) : tokens cs
Converting the tokeniser to Erlang is a trivial exercise for
the reader.
valid_month :: String -> Int
valid_month "jan" = 1
valid_month "january" = 1
...
valid_month "december" = 12
-- anything else will crash
string_to_date :: [Char] -> (Int,Int,Int)
string_to_date cs =
case tokens cs of
[TInt d,TSlash,TInt m,TSlash,TInt y] -> check y m d
[TInt y,TDash, TInt m,TDash, TInt d] -> check y m d
[TInt d,TWord m,TInt y] -> check y (valid_month m) d
[TWord m,TInt d,TComma,TInt y] -> check y (valid_month m) d
[TWord m,TInt d, TInt y] -> check y (valid_month m) d
-- anything else will crash
check :: Int -> Int -> Int -> (Int,Int,Int)
-- left as a boring exercise for the reader.
Converting this to Erlang is also a trivial exercise for the reader.
You will notice that there are multiple scanning functions and
no 'what state am I in?' parameter. Your scanner should KNOW
what state it is in because it knows what function is running.
Yecc is a great tool, but for something like this there's no
real point in it, and even for something like JSON I would
rather not use it.
One thing that Leex and Yecc can do for you
is to help you track source position for reporting
errors. For a configuration file, it may be sufficient to
just say "Can't parse configuration file X as JSON."
OK, the technique I used above is "recursive descent",
which works brilliantly for LL(k) languages with small k.
But you knew that.
Oh yes, this does mean that writing a parser is just like
writing a lexical analyser, except that you get to use
general recursion. Again, you typically have (at least)
one function per non-terminal symbol, plus (if your
original specification used extended BNF) one function
per repetition.
Heck.
s expression
= word
| "(", [s expression+, [".", s expression]], ")".
data SExpr
= Word String
| Cons SExpr SExpr
| Nil
s_expression :: [Token] -> (SExpr, [Token])
s_expression (TWord w : ts) = (Word w, ts)
s_expression (TLp : TRp : ts) = (Nil, ts)
s_expression (TLp : ts) = s_expr_body ts
s_expr_body (TRp : ts) = (Nil ts)
s_expr_body (TDot : ts) =
let (e, TRp : ts') = s_expression ts
in (e, ts')
s_expr_body ts =
let (f, ts') = s_expression ts
(r, ts'') = s_expr_body ts'
in (f:r, ts'')
This is so close to JSON that handling JSON without
"objects" should now be straightforward. And it makes
a good development step.
-----
Geen virus gevonden in dit bericht.
Gecontroleerd door AVG - www.avg.com
Versie: 2015.0.6140 / Virusdatabase: 4409/10524 - datum van uitgifte: 08/27/15

Bengt Kleberg

2015-08-28 08:31:27 UTC

Permalink

Greetings,

Dialyzer is not my forte, but on line 5 you have a type (token) without
() after it. All other types have () as suffix.
Could this be a problem?

bengt

Post by Roelof Wobben
I will take the challenge but im stuck at the types part.
-module(time_parser).
-export([]).
-type token :: tInt()
| tWord()
| tSlash()
| tDash()
| tComma().
-type tint() :: integer().
-type tword() :: binary().
-type tSlash() :: binary().
-type tDash() :: binary().
-type tComma() :: binary().
and I see this error : time_parser.erl:5: bad type declaration
Roelof
ime_parser.erl:5: bad type declaration

Post by Richard A. O'Keefe

Post by Roelof Wobben
Thanks,
http://www.evanmiller.org/write-a-template-compiler-for-erlang.html

_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

Richard A. O'Keefe

2015-08-28 09:02:12 UTC

Permalink

Post by Roelof Wobben
I will take the challenge but im stuck at the types part.

By far the easiest way to convert my Haskell sample code
to Erlang is to throw the types completely away, or just
leave them as comments.

Post by Roelof Wobben
-module(time_parser).
-export([]).
-type token :: tInt()
| tWord()
| tSlash()
| tDash()
| tComma().
-type tint() :: integer().
-type tword() :: binary().
-type tSlash() :: binary().
-type tDash() :: binary().
-type tComma() :: binary().

Leaving the omitted () aside, this isn't even CLOSE to a
good translation of the Haskell data type.
This makes tword -- should have been tWord and both of
them should be t_word in idiomatic Erlang -- and tSlash
and tDash and tComma the *same* type. But the whole point
is to make them DIFFERENT.

-type token
:: {int,integer()}
| {word,string()} %% NOT binary!
| '/'
| '-'
| ','.

The alternatives MUST be such that they cannot be
confused with one another.

tokens([]) -> [];
tokens([C|Cs]) when C =< 32 -> tokens(Cs);
tokens([C|Cs]) when $0 =< C, C =< $9 -> digits(Cs, C-$0);
tokens([C|Cs]) when $a =< C, C =< $z -> word(Cs, [C]);
tokens([C|Cs]) when $A =< C, C =< $Z -> word(Cs, [C]);
tokens("/"++Cs) -> ['/' | tokens(Cs)];
tokens("-"++Cs) -> ['-' | tokens(Cs)];
tokens(","++Cs) -> [',' | tokens(Cs)].

Of course this wants to convert the letters to lower case,
and it would be really nice to have standard
is_digit(Codepoint[, Base])
is_lower(Codepoint)
is_upper(Codepoint)
is_alpha(Codepoint)
is_space(Codepoint)
guards. No, macros are NOT good enough;
-define(is_alpha(C), $a =< ((C) bor 32) =< $z).
was fine for ASCII, but failed dramatically for Latin 1,
and these are the days of Unicode. It's nearly 9pm, time
to go home. Maybe I should write an EEP about this.

You might say, well, use regular expressions.
Match letters using the POSIX '[[:alpha:]]' construction.
But what does _that_ rely on, eh?

I have completed the translation of the tokeniser from
Haskell to Erlang, and it is pretty much line for line,
and it works.

4> t:t("Jan 26, 1942").
[{word,"jan"},{int,26},',',{int,1942}]
5> t:t("You need gumboots").
[{word,"you"},{word,"need"},{word,"gumboots"}]
6> t:t("Can you dance the Watusi?").
** exception error: no function clause matching t:tokens("?") (t.erl, line 7)
in function t:word/2 (t.erl, line 22)
in call from t:word/2 (t.erl, line 22)

Here's a curious thought.

The use of [Token | tokens(Cs)] means that the stack builds up
a tower of tokens/1 calls, one per token. By passing the list
of tokens so far through, this can all be tail calls.

tokens(Cs) ->
lists:reverse(tokens(Cs, [])).

...
tokens("/"++Cs, Ts) -> tokens(Cs, ['/'|Ts]);
...

But then there's that reversal step. Not a big deal, BUT
it's no harder to parse JSON backwards than it is to parse
JSON forwards! (Even if you allow JavaScript comments,
they disappear in tokenising, so the *tokens* can be parsed
backwards easily.) This is a peculiarity of JSON. I think
you can pull the same trick with XML: lex it forwards, parse
the token sequence backwards.

Roelof Wobben

2015-08-28 11:04:14 UTC

Permalink

Post by Richard A. O'Keefe

Post by Roelof Wobben
I will take the challenge but im stuck at the types part.

By far the easiest way to convert my Haskell sample code
to Erlang is to throw the types completely away, or just
leave them as comments.

I have made them the same type because all three are only 1 character.
I have used the code from fiffy as reference to make my own.
I understand that they must be different but on some point they are the
same.

Post by Richard A. O'Keefe
-type token
:: {int,integer()}
| {word,string()} %% NOT binary!
| '/'
| '-'
| ','.
The alternatives MUST be such that they cannot be
confused with one another.
tokens([]) -> [];
tokens([C|Cs]) when C =< 32 -> tokens(Cs);
tokens([C|Cs]) when $0 =< C, C =< $9 -> digits(Cs, C-$0);
tokens([C|Cs]) when $a =< C, C =< $z -> word(Cs, [C]);
tokens([C|Cs]) when $A =< C, C =< $Z -> word(Cs, [C]);
tokens("/"++Cs) -> ['/' | tokens(Cs)];
tokens("-"++Cs) -> ['-' | tokens(Cs)];
tokens(","++Cs) -> [',' | tokens(Cs)].

Pity , you gave me the answer. Now I can do a copy/paste and go on and
the next time I do it wrong again.
That is why I want to try this one on my own and make my own mistakes
and learn from it , make more mistake and also learn from them.

Loïc Hoguin

2015-08-28 11:15:07 UTC

Permalink

Post by Roelof Wobben
Pity , you gave me the answer. Now I can do a copy/paste and go on and
the next time I do it wrong again.
That is why I want to try this one on my own and make my own mistakes
and learn from it , make more mistake and also learn from them.

Short piece of advice: if you do want to make your own mistakes and
learn from it, *you have to find the solutions on your own*. Even if
they are incomplete or barely working solutions.

Practice makes perfect. Not asking questions every time things doesn't
work. Even if people don't give you the solution directly.

If you can't solve certain problems, it means you are lacking required
knowledge, so move back to earlier problems and continue learning from
there. Or try to give yourself a few gradually harder challenges until
you get to the point where you can solve the problems you got stuck with.

Considering your recent questions, you clearly lack knowledge that's
*very basic* so I am not sure why you start writing typespecs and
parsers which are advanced topics.

--
Loïc Hoguin
http://ninenines.eu
Author of The Erlanger Playbook,
A book about software development using Erlang

Hugo Mills

2015-08-28 12:27:14 UTC

Permalink

Post by LoÃ¯c Hoguin

Short piece of advice: if you do want to make your own mistakes and
learn from it, *you have to find the solutions on your own*. Even if
they are incomplete or barely working solutions.
Practice makes perfect. Not asking questions every time things
doesn't work. Even if people don't give you the solution directly.
If you can't solve certain problems, it means you are lacking
required knowledge, so move back to earlier problems and continue
learning from there. Or try to give yourself a few gradually harder
challenges until you get to the point where you can solve the
problems you got stuck with.

I second this.

It seems to me that you're treating the exercises as a simple
checklist of things you have to do before moving on to the next topic.
That's not a particularly good approach to take. The world doesn't
need another implementation of counting the number of functions in
each module in erlang. What it does need is another person who knows
how to use list comprehensions and folds on structured data to turn it
into the desired output (whatever that desired output is).

So... when you've done an exercise, don't just ask if the code
works. Ask yourself what that exercise was trying to help you to
learn, in the context of the chapter in question. An exercise asking
for a JSON parser may be written in exactly the same way, but will
have different learning goals depending on context. Consider:

- You've just learned about I/O and binaries
versus
- You've just learned about maps
versus
- You've just learned about leex and yecc, the parser-generation tools

In each case, your solution to writing a JSON "parser" is going to
look very different, because you're using the problem *to exercise
your new knowledge*. The author doesn't care about having another JSON
parser -- there's lots of them out there already. What they're trying
to do is get you to think about the problem *in terms of the thing
they've just taught you*.

One thing that's missing in your questions here, and which is
leading inevitably to misleading recommendations (like using jsx or
leex in this particular case) is the context: you're not saying what
the goal of the learning experience is. As a result, you're getting
good general-purpose engineering solutions to the *problem*, rather
than things that will help you *learn*.

The best people to listen to here are generally the ones who ask
you the awkward or difficult questions, because they're generally the
ones who are trying to get you to think about the problem in a
particular way. If you can answer those questions for yourself, you
will usually be thinking about the problem in a way that will get you
to the answer. If you can learn to think about problems in that way on
your own, then you will actually be able to write useful code. This
comes with practice, and lots of fiddling and failure on the way, but
it's not something that can be taught directly as a step-by-step
process to follow.

So, to reinforce the primary point here: consider the exercises as
*practice* for what you were being taught, not as goals in their own
right. Do the practice, not "the exercises". The goal, for you, is to
be able to take the separate pieces and techniques and assemble them
into something that solves the problem at hand.

Post by LoÃ¯c Hoguin
Considering your recent questions, you clearly lack knowledge that's
*very basic* so I am not sure why you start writing typespecs and
parsers which are advanced topics.

This is, I think, a symptom of the problems I've talked about
above. You do seem to have treated the exercises as "things to get
past", rather than "things to practice with". As a novice to
programming as well as Erlang, you may well have to have more
practice.

As an extra exercise in each chapter, try coming up with your own
(simple) problem that shows off your skills with the techniques used
in that chapter. This has two parts to it: thinking of something
appropriate (which is itself a fairly hard problem), and then writing
the code for it (which is easier, if you came up with a suitable
problem in the first step).

I used to be a mathematician(*), so I've got a whole raft of
interesting little exercises I can fiddle around with when I'm
learning the basics of a new language. Probably my first self-driven
code in erlang was something to generate all possible permutations of
a list. My first (and, to date, only) code in Julia was,
appropriately, a Julia set generator. If you have any kind of
technical or scientific training, think of a simple problem -- one you
could do on a piece of paper -- based on structures and concepts you
already know, and try to solve it in code.

You should be thinking, when you read the text of your book, "ah,
now I understand, I could use this for <this-part> of
<my-problem-B>". Make connections with the things you already
understand and know how to do. See if the stuff in the new chapter
helps you understand any of the problems in the previous chapter any
better.

That turned out to be longer than I originally intended. I hope
it's useful, anyway.

Hugo.

(*) I'm better now.

--
Hugo Mills | Great oxymorons of the world, no. 10:
***@... carfax.org.uk | Business Ethics
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Hugo Mills

2015-08-28 13:11:06 UTC

Permalink

[Roelof, apologies for missing you out on the cc on my last mail -- I
hope you got it through the mailing list. I don't know how that
happened.]

Post by Hugo Mills
The best people to listen to here are generally the ones who ask
you the awkward or difficult questions, because they're generally the
ones who are trying to get you to think about the problem in a
particular way. If you can answer those questions for yourself, you
will usually be thinking about the problem in a way that will get you
to the answer. If you can learn to think about problems in that way on
your own, then you will actually be able to write useful code. This
comes with practice, and lots of fiddling and failure on the way, but
it's not something that can be taught directly as a step-by-step
process to follow.

Sorry to drone on here, but I wanted to follow up on this point a
bit.

Most problems in CS don't have linear paths to the solution. You
don't just start with "here's the input, so I do this, do this, do
this, and there's the answer". You have to pick at them around the
edges, find the pieces you know how to do. Consider doing something
because it looks like it might get you closer to the structure you
need. Work from both ends at once: you have some data as input, and
you want a particular structure as output; what can you do to the
input to simplify it? What's the simplest data that you could use to
generate the output, and how would you do it? Do those two meet in the
middle? If not, try again and work the edges towards each other. Is
there a simpler version of the problem you *can* solve? Do that first,
then see if you can modify it to deal with the other pieces.

Example: For this JSON-parsing problem, what's your input? (I'd
guess you just read the whole file in as a binary or a string to start
with -- pick one, do it; bear in mind what the chapter is trying to
teach you, as this may make a difference in which one you pick). Given
input in that form, if the first thing in your input is a plain quoted
string, can you turn that into an Erlang string and a piece of unused
input? Do that. What about a number to an Erlang number? Do that. An
unquoted string to an atom? Do that. Can you turn a simple JSON array
into an Erlang list of terms (and some unused input)? Do that. Can
you modify that to parse an array of arrays? Do that. Can you turn a
simple non-nested JSON map into an Erlang map (or dict, or whatever)?
Do that. Can you use those as building blocks to parse a full JSON
structure? You're done. It'll be a pretty useless JSON parser for
practical purposes, but that's OK, you're just learning. (To repeat my
last email, the output of your work is not a program; the output is a
programmer).

Try writing down (in your own language) a problem description for
each of the stages above, and treat it as a self-contained problem.
Some of them will be trivial. Some will be a bit more complicated. The
later ones will build on the earlier ones. Write code (or modify
existing code) for each one. Being able to do this -- break a problem
down into separate pieces you know how to solve -- is the essence of
writing software, and it's the hardest thing to learn how to do well.

Hugo.

--
Hugo Mills | Jazz is the sort of music where no-one plays
***@... carfax.org.uk | anything the same way once.
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Roelof Wobben

2015-08-28 13:18:37 UTC

Permalink

Post by Hugo Mills
[Roelof, apologies for missing you out on the cc on my last mail -- I
hope you got it through the mailing list. I don't know how that
happened.]

Sorry to drone on here, but I wanted to follow up on this point a
bit.
Most problems in CS don't have linear paths to the solution. You
don't just start with "here's the input, so I do this, do this, do
this, and there's the answer". You have to pick at them around the
edges, find the pieces you know how to do. Consider doing something
because it looks like it might get you closer to the structure you
need. Work from both ends at once: you have some data as input, and
you want a particular structure as output; what can you do to the
input to simplify it? What's the simplest data that you could use to
generate the output, and how would you do it? Do those two meet in the
middle? If not, try again and work the edges towards each other. Is
there a simpler version of the problem you *can* solve? Do that first,
then see if you can modify it to deal with the other pieces.
Example: For this JSON-parsing problem, what's your input? (I'd
guess you just read the whole file in as a binary or a string to start
with -- pick one, do it; bear in mind what the chapter is trying to
teach you, as this may make a difference in which one you pick). Given
input in that form, if the first thing in your input is a plain quoted
string, can you turn that into an Erlang string and a piece of unused
input? Do that. What about a number to an Erlang number? Do that. An
unquoted string to an atom? Do that. Can you turn a simple JSON array
into an Erlang list of terms (and some unused input)? Do that. Can
you modify that to parse an array of arrays? Do that. Can you turn a
simple non-nested JSON map into an Erlang map (or dict, or whatever)?
Do that. Can you use those as building blocks to parse a full JSON
structure? You're done. It'll be a pretty useless JSON parser for
practical purposes, but that's OK, you're just learning. (To repeat my
last email, the output of your work is not a program; the output is a
programmer).
Try writing down (in your own language) a problem description for
each of the stages above, and treat it as a self-contained problem.
Some of them will be trivial. Some will be a bit more complicated. The
later ones will build on the earlier ones. Write code (or modify
existing code) for each one. Being able to do this -- break a problem
down into separate pieces you know how to solve -- is the essence of
writing software, and it's the hardest thing to learn how to do well.
Hugo.

Thanks for the advise.

Roelof

Roelof Wobben

2015-08-28 19:20:51 UTC

Permalink

One question where I cannot find the answer to.

So far I have this :

-module(time_parser).

-export([tokens/1]).

tokens([]) -> [];

tokens ([Head|Rest]) when Head =:= 32 ->
tokens(Rest);

tokens ([Head|Rest]) when Head >= $0 , Head =< $9 ->
digits(Head,Rest) ;

%tokens([Head|Rest]) when Head >= "a" , Head =< "z" ->
% word(Rest, [Head]);

tokens (['/'| Rest]) ->
[ '/' | tokens (Rest) ];

tokens (['-' | Rest]) ->
[ '-' | tokens (Rest)];

tokens ([','| Rest]) ->
[ '_' | tokens (Rest)].

digits( Number, [Head | Rest]) when Head >= $0 , Head =< $9 ->
digits(Number * 10 + Head , Rest);

digits(Number, Number2) when Number >= $0 , Number =< $9 ->
Number + Number2.

so I do this in erl;

7> c(time_parser). {ok,time_parser}
8> time_parser:tokens([10]).
** exception error: no function clause matching time_parser:tokens("\n")
(time_parser.erl, line 5)

where do the /n come from ?

Roelof

Post by Roelof Wobben

Post by Hugo Mills
[Roelof, apologies for missing you out on the cc on my last mail -- I
hope you got it through the mailing list. I don't know how that
happened.]

Sorry to drone on here, but I wanted to follow up on this point a
bit.
Most problems in CS don't have linear paths to the solution. You
don't just start with "here's the input, so I do this, do this, do
this, and there's the answer". You have to pick at them around the
edges, find the pieces you know how to do. Consider doing something
because it looks like it might get you closer to the structure you
need. Work from both ends at once: you have some data as input, and
you want a particular structure as output; what can you do to the
input to simplify it? What's the simplest data that you could use to
generate the output, and how would you do it? Do those two meet in the
middle? If not, try again and work the edges towards each other. Is
there a simpler version of the problem you *can* solve? Do that first,
then see if you can modify it to deal with the other pieces.
Example: For this JSON-parsing problem, what's your input? (I'd
guess you just read the whole file in as a binary or a string to start
with -- pick one, do it; bear in mind what the chapter is trying to
teach you, as this may make a difference in which one you pick). Given
input in that form, if the first thing in your input is a plain quoted
string, can you turn that into an Erlang string and a piece of unused
input? Do that. What about a number to an Erlang number? Do that. An
unquoted string to an atom? Do that. Can you turn a simple JSON array
into an Erlang list of terms (and some unused input)? Do that. Can
you modify that to parse an array of arrays? Do that. Can you turn a
simple non-nested JSON map into an Erlang map (or dict, or whatever)?
Do that. Can you use those as building blocks to parse a full JSON
structure? You're done. It'll be a pretty useless JSON parser for
practical purposes, but that's OK, you're just learning. (To repeat my
last email, the output of your work is not a program; the output is a
programmer).
Try writing down (in your own language) a problem description for
each of the stages above, and treat it as a self-contained problem.
Some of them will be trivial. Some will be a bit more complicated. The
later ones will build on the earlier ones. Write code (or modify
existing code) for each one. Being able to do this -- break a problem
down into separate pieces you know how to solve -- is the essence of
writing software, and it's the hardest thing to learn how to do well.
Hugo.

Thanks for the advise.
Roelof

Hugo Mills

2015-08-28 19:30:47 UTC

Permalink

Post by Roelof Wobben
One question where I cannot find the answer to.
-module(time_parser).
-export([tokens/1]).
tokens([]) -> [];
tokens ([Head|Rest]) when Head =:= 32 ->
tokens(Rest);
tokens ([Head|Rest]) when Head >= $0 , Head =< $9 ->
digits(Head,Rest) ;
%tokens([Head|Rest]) when Head >= "a" , Head =< "z" ->
% word(Rest, [Head]);
tokens (['/'| Rest]) ->
[ '/' | tokens (Rest) ];
tokens (['-' | Rest]) ->
[ '-' | tokens (Rest)];
tokens ([','| Rest]) ->
[ '_' | tokens (Rest)].
digits( Number, [Head | Rest]) when Head >= $0 , Head =< $9 ->
digits(Number * 10 + Head , Rest);
digits(Number, Number2) when Number >= $0 , Number =< $9 ->
Number + Number2.
so I do this in erl;
7> c(time_parser). {ok,time_parser}
8> time_parser:tokens([10]).
** exception error: no function clause matching
time_parser:tokens("\n") (time_parser.erl, line 5)
where do the /n come from ?

ASCII 10 is a newline character, usually rendered as \n

You passed a list containing the single number 10. In erlang,
strings are lists of integers, with each integer being the code
(a Unicode code point) of a character of the string.

Hugo.

Post by Roelof Wobben

Post by Hugo Mills
[Roelof, apologies for missing you out on the cc on my last mail -- I
hope you got it through the mailing list. I don't know how that
happened.]

Sorry to drone on here, but I wanted to follow up on this point a
bit.
Most problems in CS don't have linear paths to the solution. You
don't just start with "here's the input, so I do this, do this, do
this, and there's the answer". You have to pick at them around the
edges, find the pieces you know how to do. Consider doing something
because it looks like it might get you closer to the structure you
need. Work from both ends at once: you have some data as input, and
you want a particular structure as output; what can you do to the
input to simplify it? What's the simplest data that you could use to
generate the output, and how would you do it? Do those two meet in the
middle? If not, try again and work the edges towards each other. Is
there a simpler version of the problem you *can* solve? Do that first,
then see if you can modify it to deal with the other pieces.
Example: For this JSON-parsing problem, what's your input? (I'd
guess you just read the whole file in as a binary or a string to start
with -- pick one, do it; bear in mind what the chapter is trying to
teach you, as this may make a difference in which one you pick). Given
input in that form, if the first thing in your input is a plain quoted
string, can you turn that into an Erlang string and a piece of unused
input? Do that. What about a number to an Erlang number? Do that. An
unquoted string to an atom? Do that. Can you turn a simple JSON array
into an Erlang list of terms (and some unused input)? Do that. Can
you modify that to parse an array of arrays? Do that. Can you turn a
simple non-nested JSON map into an Erlang map (or dict, or whatever)?
Do that. Can you use those as building blocks to parse a full JSON
structure? You're done. It'll be a pretty useless JSON parser for
practical purposes, but that's OK, you're just learning. (To repeat my
last email, the output of your work is not a program; the output is a
programmer).
Try writing down (in your own language) a problem description for
each of the stages above, and treat it as a self-contained problem.
Some of them will be trivial. Some will be a bit more complicated. The
later ones will build on the earlier ones. Write code (or modify
existing code) for each one. Being able to do this -- break a problem
down into separate pieces you know how to solve -- is the essence of
writing software, and it's the hardest thing to learn how to do well.
Hugo.

Thanks for the advise.
Roelof

_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions

--
Hugo Mills | Hey, Virtual Memory! Now I can have a *really big*
***@... carfax.org.uk | ramdisk!
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Kevin Montuori

2015-08-28 19:33:56 UTC

Permalink

8> time_parser:tokens([10]).
rw> ** exception error: no function clause matching
rw> time_parser:tokens("\n") (time_parser.erl, line 5)

rw> where do the /n come from ?

What happens when you type

1> ([10]).

in the REPL?

k.

--
Kevin Montuori
***@gmail.com

Hugo Mills

2015-08-28 19:47:10 UTC

Permalink

Post by Roelof Wobben
8> time_parser:tokens([10]).
rw> ** exception error: no function clause matching
rw> time_parser:tokens("\n") (time_parser.erl, line 5)
rw> where do the /n come from ?
What happens when you type
1> ([10]).
in the REPL?

Roelof: This is the kind of question that I was saying you should
be paying attention to. If you can answer this question and explain
it, you've got your answer to your original question. You should have
the knowledge from earlier reading in your book to explain the results
you see.

(Or just read my mail from a few minutes ago, which gives you the
answer directly... my teaching error).

Hugo.

--
Hugo Mills | Hey, Virtual Memory! Now I can have a *really big*
***@... carfax.org.uk | ramdisk!
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Roelof Wobben

2015-08-28 19:52:04 UTC

Permalink

Post by Hugo Mills

Roelof: This is the kind of question that I was saying you should
be paying attention to. If you can answer this question and explain
it, you've got your answer to your original question. You should have
the knowledge from earlier reading in your book to explain the results
you see.
(Or just read my mail from a few minutes ago, which gives you the
answer directly... my teaching error).
Hugo.

Thanks,

I understand it now. My mind is still not set up for seeing this sort of
things. Maybe to late.
Also I see that I have to rewrite my digits function. When entering 10
I see the answer 538 which is the 49 (1) * 10 + 48(0)
Now time to sleep and rethink it.

Roelof

Adam Krupicka

2015-08-28 19:29:58 UTC

Permalink

Post by Roelof Wobben
7> c(time_parser). {ok,time_parser}
8> time_parser:tokens([10]).
** exception error: no function clause matching time_parser:tokens("\n")
(time_parser.erl, line 5)
where do the /n come from ?

Hi! 10 is the decimal ASCII code for the newline character. This is how
strings are actually usually represented in Erlang: a list of code
points.

If you want to have the parser parse the integer 10, you probably
need to supply it as a string:

9> time_parser:tokens("10").

"10" is in fact syntax sugar for [$1, $0]. The dollar character($)
followed by a character is evaluated to the numerical value of the
character. For example:

1> $\n.
10

Cheers,
A. K.

Richard A. O'Keefe

2015-08-31 04:25:17 UTC

Permalink

Specifically concerning the parsing of JSON,
actual numbers for my json.erl.

% SIZE Lines Functions clauses
% entry 5 1 1
% tokenising 77 6 29 (or 36, if you count 'case')
% parsing 28 3 6 (or 12, if you count 'case')
% utility 4 1 2
% TOTAL 114 11 38 (or 51, if you count 'case')

It was amusing that *parsing* JSON required just 3 functions
(parse any JSON value, parse body of non-empty array, parse
body of non-empty object). It was interesting to see where
the difficulty was: numbers (four functions) and strings
(handling backslash escapes). Strings can appear in two places
in the JSON grammar, so having them recognised once in the
tokeniser avoids duplication in the parser.

Garrett Smith

2015-08-28 11:23:33 UTC

Permalink

Post by Roelof Wobben

Post by Richard A. O'Keefe

Post by Roelof Wobben
I will take the challenge but im stuck at the types part.

By far the easiest way to convert my Haskell sample code
to Erlang is to throw the types completely away, or just
leave them as comments.

Pity , you gave me the answer. Now I can do a copy/paste and go on and the
next time I do it wrong again.
That is why I want to try this one on my own and make my own mistakes and
learn from it , make more mistake and also learn from them.

No no no no. That's not the way to think about this IMO.

In software you learn from your failures, but you learn *a lot more*
from others' successes!

I had originally suggested that you start learning about parser
generating by giving you a complete working example, albeit a simple
one. That's not depriving you of learning, it's giving you a road map
for learning. With a working model you can start to tinker with the
code, *use it*, start to change little bits of it to experiment with
it and see how things change. IMO that's a learning process, it
doesn't give anything away or make anything less valuable - to the
contrary.

I haven't read this thread carefully enough to know what this new
challenge is, but I'd go back to the challenge I suggested earlier,
which is to create a simple calculator. Use the hint - just use it!
(Maybe you have already and have moved on to your next challenge -
again, I'm not reading carefully here.) Then play with the stuff
that's *working* and start to improve it (or simplify it event) - you
should not move onto any new challenges until you know that code
inside and out and understand what each and every character is there
for. And IMO that requires exercising it iteratively as you make
changes.

And please be sensitive to the challenges of your readers here - you
are asking for help on specific topics and the temptation is to help
you. You've got the best humans on earth (excluding me, work in
progress) reading your challenges and thinking carefully about your
learning process and the issues you're facing - and then taking a
great deal of time and care to respond constructively. If someone
slips up and give you an answer here or there, please forgive them as
they are ony trying to help you.

o***@cs.otago.ac.nz

2015-08-29 11:56:13 UTC

Permalink

Post by Roelof Wobben
I have made them the same type because all three are only 1 character.

I am suddenly feeling very cross.
YOU WANT TO BE ABLE TO TELL AT A GLANCE WHAT KIND OF TOKEN YOU
HAVE. That means *NOT* hiding stuff inside a binary.

Post by Roelof Wobben
I have used the code from fiffy as reference to make my own.
I understand that they must be different but on some point they are the
same.

No, and no, and NO. THEY ARE DIFFERENT. The fact that they
*happened* to be represented by single characters in the input
is UTTERLY UNINTERESTING. You have LESS THAN NO INTEREST in
knowing what the characters were. It's like the way in Pascal
"[" and "(*" were different *character sequences* but the same
*token*. When you are dealing with TOKENS, you not only do not
want to know anything about the characters, you WANT NOT TO KNOW
any thing about the characters.

Here again is the Haskell type declaration:

data Token -- a Token is
= TInt Int -- an integer, or
| TWord String -- a word, or
| TDash -- a dash, or
| TSlash -- a slash, or
| TComma -- a comma.

When we are dealing with tokens, WE NEED TO BE ABLE TO TELL
ONE KIND OF TOKEN FROM ANOTHER BY A SINGLE PATTERN MATCH.
That's what a sum-of-products data type is all about; it's
a thing that lets us tell what we have by a single 'case'
analysis.

When we are dealing with tokens, we have made that choice
so we don't have to deal with characters. We don't *care*
whether an integer was written as 10, 010, 000000000000010,
or in another context, 2r1010, 3r101, 0xA, $\n, ...

Notice that a TInt token has some associated information,
and a TWord token has some associated information, in both
cases *derived from* but *not identical to* their source
characters, but a TDash, a TSlash, or a TComma have *NO*
associated information. They are NOT associated with any
character or string. As far as the rest of the program is
concerned, IT DOES NOT MATTER whether TDash stands for
U+002D, U+2010, U+2011, U+2012, U+2013, U+2014, U+2052,
U+2053, U+2448, U+2212, or whatever. That information is
*GONE*, and it's gone because we WANT it gone. We need to
be able to tell one token from another and to recover any
important information, BUT WE HAVE NO INTEREST IN WHAT
THE TOKENS LOOKED LIKE ANY MORE.

Like I said before, this separation of concerns between a
stage where we *do* have to care about the textual
representation of tokens and a stage where we can heave a
huge sigh of relief and forget that rubbish is one of the
reasons why we make a distinction between character
sequences and token sequences in the first place; it's one
of the reasons why I have no desire ever to use "scannerless"
parsing technology.

Now we want to map that token representation into Erlang.
And we DON'T start by writing -type declarations.
We start by saying "There are five situations that I want
to be able to discriminate with a single 'case' in Erlang.
Two of them have one item of associated information each,
and the other three have no associated information."

The first thing to do is to sort these things into groups
where all the situations in each group have the same
number of pieces of associated information.
Situations with NO associated information can (and should!)
be represented by atoms.
Situations with N pieces of associated information should
normally be represented by tuples with N+1 elements, the
first being an atom.
The atoms within a group MUST be different, so that a
single 'case' can trivially distinguish the situations.
All the atoms SHOULD be different so that people can make
sense of them.

So we end up with

TInt i {int,I}
TWord w {word,W}
TDash dash % I used '-' before
TSlash slash % I used '/' before
TComma comma % I used ',' before

I've used different atoms this time to make the point
that there is NO necessary connection between the names
we use to distinguish the cases and the spelling of any
of the tokens.

It does not make sense, in *any* programming language,
to associate a binary with the dash, slash, or comma
tokens. We DO need to know what kind of token we are
dealing with. We do NOT need to know how it was spelled.

Suppose we were doing this in C. We might have

enum Tag {T_Int, T_Word, T_Dash, T_Slash, T_Comma};

typedef struct Token {
enum Tag tag;
union {
int val; // Used only when tag == T_Int
char const *str; // Used only when tag == T_Word
/* NOTHING */ // T_Dash, T_Slash, T_Comma
} u;
} Token;

This really isn't about Erlang, in fact. It is wrong in
*any* language to represent three *different* tokens by
the *same* thing.

Now that we've figured out that we want to use
{int,I}
{word,"w..."}
dash
slash
comma
to represent the different tokens, *NOW* you can write
a type declaration that expresses this.

-type token()
:: {int,integer()}
| {word,string()}
| dash
| slash
| comma.

This is not a matter of taste or style.
Any time you write a type union in Erlang where two of the
alternatives even overlap, you should get worried. Because
that means you have an *ambiguous* type; one where there
is some value such that you cannot tell which of the
alternatives it belongs to on the basis of its form. This
is not always a mistake, but "mistake" is the way to bet.
Amongst other things, while the *computer* may be able to
work it out, ambiguous alternatives are situations where
*people* are likely to be confused.

Oh yeah, the other thing. There is only ONE type introduced
here. There was ONE type in the Haskell code; that should
have been a clue that ONE type was probably the right thing
in the Erlang translation. 'dash' and 'comma' are distinct
VALUES of the token() type; they are not usefully to be
thought of as distinct TYPES.

On present showing, you have nothing to worry about on that
score, and I hope you have learned something about designing
data types.

Richard A. O'Keefe

2015-08-31 00:27:23 UTC

Permalink

Post by o***@cs.otago.ac.nz
knowing what the characters were. It's like the way in Pascal
"[" and "(*" were different *character sequences* but the same
*token*.

What nonsense! "(*" is equivalent (6.1.8) to "{", not "[".
The equivalent of "[" is "(." (6.1.9), not "(*".

This howler doesn't invalidate the main point.

Roelof Wobben

2015-08-28 10:59:55 UTC

Permalink

Post by Bengt Kleberg
Greetings,
Dialyzer is not my forte, but on line 5 you have a type (token)
without () after it. All other types have () as suffix.
Could this be a problem?
bengt

I do noy think so.

When I changed it to this :

-module(time_parser).

-export([]).

-type tint() :: integer().
-type tword() :: binary().
-type tSlash() :: binary().
-type tDash() :: binary().
-type tComma() :: binary().

-type token() :: tInt()
| tWord()
| tSlash()
| tDash()
| tComma().

I see these error messagaes;

time_parser.erl:12: type tInt() undefined time_parser.erl:13: type
tWord() undefined
time_parser.erl:5: Warning: type tint() is unused
time_parser.erl:6: Warning: type tword() is unused
time_parser.erl:12: Warning: type token() is unused error

Roelof

Kevin Montuori

2015-08-28 11:50:47 UTC

Permalink

rw> -type tword() :: binary().
rw> -type token() :: tInt()
rw> | tWord()

rw> time_parser.erl:12: type tInt() undefined time_parser.erl:13: type
rw> tWord() undefined
rw> time_parser.erl:6: Warning: type tword() is unused

[Leaving aside all of the excellent advice that's been proffered...]

Did you read these messages carefully? tWord() is undefined but tword()
is unused: the answer's staring you in the face. If you've been writing
Erlang for a while this error/warning pair should ring a bell.

Perhaps you should revisit Mr. O'Keefe's email, specifically the part
where he discusses Erlang naming conventions (and tells you *exactly*
what's wrong with what you have)?

k.

--
Kevin Montuori
***@gmail.com

Roelof Wobben

2015-08-28 12:34:36 UTC

Permalink

Post by Kevin Montuori
rw> -type tword() :: binary().
rw> -type token() :: tInt()
rw> | tWord()
rw> time_parser.erl:12: type tInt() undefined time_parser.erl:13: type
rw> tWord() undefined
rw> time_parser.erl:6: Warning: type tword() is unused
[Leaving aside all of the excellent advice that's been proffered...]
Did you read these messages carefully? tWord() is undefined but tword()
is unused: the answer's staring you in the face. If you've been writing
Erlang for a while this error/warning pair should ring a bell.
Perhaps you should revisit Mr. O'Keefe's email, specifically the part
where he discusses Erlang naming conventions (and tells you *exactly*
what's wrong with what you have)?
k.

With the help of everything I found this piece working.

-module(time_parser).

-export([]).

-type token() :: t_Int()
| t_Word()
| '/'
| '-'
| ','.

-type t_Int() :: integer() | int.
-type t_Word() :: word | string().

So in normal Englisch a token can be a t-int , a t_word , a / , a - or a
, If its something else its failing which is allright.
type t_int can be a integer or a atom called int.
type t_word can be a string or a atom called word.
if it' something else it's failing.

Roelof

o***@cs.otago.ac.nz

2015-08-29 12:13:43 UTC

Permalink

Post by Roelof Wobben
-type t_Int() :: integer() | int.
-type t_Word() :: word | string().

First, there should not be any t_Int() or t_Word() type at all.

Second, you are saying here that
"Something is a t_Int() if and only if
(1) it is an integer() or
(2) it is the atom 'int'."

That does not make sense. Going back to the Haskell
example,
data Token
= TInt Int | ...
says
"Token is a new type.
One kind of token is called TInt; that kind has
an Int inside it. ...."
TInt here is NOT A TYPE (the "T" prefix stood for "token",
not "type"; Haskell has no type prefix convention),
it is a constructor function.
It's a *label* pasted onto a record to distinguish it
from all the other kinds of token. I could write this
type in Pascal:

type Token = record
case tag : (TInt, TWord, TDash, TSlash, TComma) of
TInt : (val : Integer);
TWord : (Str : Alpha);
TDash : ();
TSlash: ();
TComma: ()
end;

The thing is that we want to be able to recognise an integer
token by PATTERN MATCHING (as in the example Haskell code),
*not* by type testing with a guard. There's nothing wrong
with type testing with a guard when we have to, but we
prefer pattern matching when we can because it is easier for
*us* as well as the computer to tell when two patterns do not
overlap.

We want to say "Here is an int token AND here is its value",
as in {int,integer()}, not "here is an int token OR here is
its value". But using integer() | int you are saying
"an integer OR the atom 'int'". And there is no case where
the input is going to justify a "token" that is, in its
entirety, 'int'.

It would be really helpful if Erlang had distinct notation
for "this is a union of types that are meant to be disjoint"
and "this is a union of types that I expect to overlap".

Look, it seems as though you don't yet have a clue how types
in Erlang work. FORGET THEM until you have mastered
pattern matching and the design of data structures that are
meant to be used through pattern matching.