Discussion:
[darcs-users] Line endings opinion poll (with bonus opinion)
Michael Conrad
2004-10-29 05:44:56 UTC
Permalink
So, one thing that always drove me nuts is the issue of editing alternately
between windows and unix. At least, before I found that I could tell
JBuilder to use \n always rather than platform default.

However, the issue is still out there for some people (as seen in the
previous post), and until they notice it, it causes the patching process to
go nuts; mainly a lot of full-file-replaces, and missed opportunities for
merging. And, I just did it to myself again yesterday. ;-)

So, the poll: There are two main camps (I think) on how lines should be
handled:
1: All spacing and formatting and control characters are part of the file,
and should not be modified by a revision control program. If it were
modified, it could end up with inconsistent results when copying between
systems and diffing by hand.

2: "ASCII Text" is a protocol which different systems speak differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.

So, whats everyone's opinion? I personally prefer #2, since it prevents
nastiness in my patches, and fixing-line-endings is something that you
expect to deal with when directly copying between diverse systems. (In an
implementation, I would expect that all transmission is done using \n, and
then windows platforms would add \r when applying, and remove it when
patching) I imagine it would be easy to implement, though it would then
probably be requested "optional", and then need a default, and
documentation, and a help entry...

-Mike
Nigel Rowe
2004-10-29 07:04:24 UTC
Permalink
Post by Michael Conrad
So, one thing that always drove me nuts is the issue of editing alternately
between windows and unix.
Amen!

<snip>
Post by Michael Conrad
2: "ASCII Text" is a protocol which different systems speak differently.
It consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.
So, whats everyone's opinion? I personally prefer #2, since it prevents
nastiness in my patches, and fixing-line-endings is something that you
expect to deal with when directly copying between diverse systems. (In an
implementation, I would expect that all transmission is done using \n, and
then windows platforms would add \r when applying, and remove it when
patching) I imagine it would be easy to implement, though it would then
probably be requested "optional", and then need a default, and
documentation, and a help entry...
-Mike
#2, Pleeeeease. I don't even care what the default is, if it can be specified
in _darcs/prefs/.

Of course this doesn't apply to any file that matches a pattern in
_darcs/prefs/binaries.


- --
Nigel Rowe
***@swiftdsl.com.au
Sean Perry
2004-10-29 07:12:09 UTC
Permalink
Post by Michael Conrad
So, the poll: There are two main camps (I think) on how lines should be
1: All spacing and formatting and control characters are part of the file,
and should not be modified by a revision control program. If it were
modified, it could end up with inconsistent results when copying between
systems and diffing by hand.
definitely. I give darcs my data and I expect it to just take it and
store it.
Peter Strand
2004-10-29 09:42:02 UTC
Permalink
Post by Michael Conrad
So, the poll: There are two main camps (I think) on how lines should be
1: All spacing and formatting and control characters are part of the file,
and should not be modified by a revision control program. If it were
modified, it could end up with inconsistent results when copying between
systems and diffing by hand.
2: "ASCII Text" is a protocol which different systems speak differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.
So, whats everyone's opinion?
#2 as default with support for #1.
(I can live with "#1 with support for #2" as well)

Usually it's preferably to use the system-specific format of text-files,
and I think that it is the revision control system's job to do the
conversion. However, it is sometimes nice to be able use an alternative
format, for example when sharing a filesystem between unix and windows
I usually use unix-encoding for all files.


I have an almost-finished patch to darcs which adds (primitive) support
for windows newlines when reading and writing text files. It basically
just adds a --convert-newlines option to get and then keeps track of the
format for subsequent operations..
I don't know if that is a good interface..?



/Peter
David Roundy
2004-10-29 11:34:37 UTC
Permalink
Post by Peter Strand
Usually it's preferably to use the system-specific format of text-files,
and I think that it is the revision control system's job to do the
conversion. However, it is sometimes nice to be able use an alternative
format, for example when sharing a filesystem between unix and windows
I usually use unix-encoding for all files.
I have an almost-finished patch to darcs which adds (primitive) support
for windows newlines when reading and writing text files. It basically
just adds a --convert-newlines option to get and then keeps track of the
format for subsequent operations..
I don't know if that is a good interface..?
Sounds good. On this subject I'm willing to defer to people who actually
use windows... :)
--
David Roundy
http://www.abridgegame.org
Kevin Ollivier
2004-10-29 16:08:59 UTC
Permalink
Hi,
Post by Peter Strand
Post by Michael Conrad
So, the poll: There are two main camps (I think) on how lines should be
1: All spacing and formatting and control characters are part of the file,
and should not be modified by a revision control program. If it were
modified, it could end up with inconsistent results when copying between
systems and diffing by hand.
2: "ASCII Text" is a protocol which different systems speak
differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and
translate
whenever data comes or goes from the system.
So, whats everyone's opinion?
#2 as default with support for #1.
(I can live with "#1 with support for #2" as well)
This is my opinion as well.
Post by Peter Strand
Usually it's preferably to use the system-specific format of
text-files,
and I think that it is the revision control system's job to do the
conversion. However, it is sometimes nice to be able use an
alternative
format, for example when sharing a filesystem between unix and windows
I usually use unix-encoding for all files.
Yes, I think there are valid arguments for both cases, and I think in
that case the best thing to do is provide options for both.
Post by Peter Strand
I have an almost-finished patch to darcs which adds (primitive) support
for windows newlines when reading and writing text files. It basically
just adds a --convert-newlines option to get and then keeps track of the
format for subsequent operations..
I don't know if that is a good interface..?
It looks good to me. Is there a way to make it the default though? (in
other words, all get operations automatically pass --convert-newlines)
It's not critical but it would be nice.

Thanks,

Kevin
Samuel A. Falvo II
2004-10-30 00:23:16 UTC
Permalink
Post by Michael Conrad
2: "ASCII Text" is a protocol which different systems speak
differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.
So, whats everyone's opinion?
There are currently three methods of encoding ASCII text. DOS, Unix, and
MacOS Classic. I don't know if MacOS X uses Unix-style line endings or
MacOS-style endings. Windows systems are schitzophrenic between Unix
and DOS styles in many cases, but seems to prefer DOS over Unix by a
relatively small margin.

This means that darcs would have to be able to somehow detect or store as
metadata which kind of system is in use when a file is initially checked
in, and remember to convert to and from that format as appropriate.

In the interest of making things easier for darcs, I suggest having a
*common* form in which *all* text files take inside the repository,
regardless of which platform darcs is running on. This will make
transferring repositorities between platforms easier. I suggest the
text transmission layout as used by RFC822, for example. Basically,
this is DOS-style, and is the original style recommended by ASCII (heh,
the ONLY standard Microsoft has adopted, it seems). That is, lines end
with CR/LF.

For Darcs running on Unix or MacOS platforms, then, knowing that a file
is plain ASCII, it can convert between its native format and the
standard format effortlessly, without having to worry who the intended
target is. This is a case of supporting N systems through N conversion
routines, instead of N*N-1.

Just my opinion on the subject. I otherwise don't care much. :) Thanks
for reading.
David Brown
2004-10-30 02:16:42 UTC
Permalink
Post by Samuel A. Falvo II
There are currently three methods of encoding ASCII text. DOS, Unix, and
MacOS Classic. I don't know if MacOS X uses Unix-style line endings or
MacOS-style endings.
On MacOSX X: some of both. Most things seem to be moving toward Unix-type
endings, although there are still a bunch of files around that use '\r'
instead of '\n' (classic). Usually applications built to work with either
will use the '\r' endings. Most new development will use '\n' since it
fits better with the rest of the world.

I would guess that nearly anyone using OSX needing darcs would have
Unix-type endings.

More of a concern might be that OSX defaults to a case-insensitive
filesystem. Sometimes people create makefiles that have the wrong case,
and it works. But that is more of a configuration issue than a darcs
issue.

Where it does become an issue is if files are checked in under multiple
casings of a given path.

foo/file1.c
Foo/file2.c

will end up in separate directories on other systems.

Dave
Björn Lindström
2004-10-30 13:33:13 UTC
Permalink
Post by David Brown
Where it does become an issue is if files are checked in under multiple
casings of a given path.
foo/file1.c
Foo/file2.c
will end up in separate directories on other systems.
Not really, since darcs doesn't allow file names that only differ in
case in the same repository.

As for my vote on the issue as a whole, I don't really care how you do
it, as long as

a) the format of the darcs repository is not changed

b) it doesn't affect my work with darcs on Unix (which is all I ever
use) in any other way.
Michael Conrad
2004-10-30 15:49:53 UTC
Permalink
Post by Samuel A. Falvo II
Post by Michael Conrad
2: "ASCII Text" is a protocol which different systems speak
differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.
So, whats everyone's opinion?
This means that darcs would have to be able to somehow detect or store as
metadata which kind of system is in use when a file is initially checked
in, and remember to convert to and from that format as appropriate.
Woah- that wasn't quite what I had in mind. I was thinking along the lines
of having a compiled instance of darcs with a
"platform-default-line-encoding" in it. This default would be determined at
compile time, and could be overridden with a prefs flag (global or local).
Any patch applied using that copy of darcs would receive the platform's
encoding, and all recordings would be written with the standard repo
encoding.
Post by Samuel A. Falvo II
In the interest of making things easier for darcs, I suggest having a
*common* form in which *all* text files take inside the repository,
regardless of which platform darcs is running on. This will make
transferring repositorities between platforms easier. I suggest the
text transmission layout as used by RFC822, for example. Basically,
this is DOS-style, and is the original style recommended by ASCII (heh,
the ONLY standard Microsoft has adopted, it seems). That is, lines end
with CR/LF.
So I guess you were thinking the same thing, kind of.

I would vote for \n unix lines, since it is the way things already are.
( and has the bonus of saving one character per line ;-] )
Standards are nice, but holy wars speak louder :-)

Also, let me offer a use-case:
Hacker Hal is in a project with newbie Ned. Hal uses unix, and offers to
host a repo for the project. He insists that Ned use darcs. Ned uses
windows. Ned goes out to the web and finds a windows copy of darcs and
installs it, along with a GUI frontend :-) Ned then manages to make some
changes and commit them, having no knowledge of what mode his editor is
using. Hal reviews the changes using a viewer which doesn't happen to show
^M. (new versions of many viewers, including less, are hiding them by
default) Hal applys them to the repo, and continues working on the project.
One day, Hal opens up a file in an old copy of vi, and notices that there
are ^M all over the place. He screams and curses the day that Ned was born,
to no avail.

So, the easiest solution would be for windows versions of darcs to
automatically strip \r. Another solution is to make Ned use an editor
capable of unix newlines, and get Ned to turn on that feature, which might
have to be enabled on a per-project basis. I think we all know the futility
of the second option. Its usually effort-enough just to get Ned to use
version control.

In the case that you have a samba-mounted home directory, you could of
course manually add the pref for line endings, since this is something that
you deal with and understand.

-Mike
Bryce Wilcox-O'Hearn
2004-10-30 23:27:49 UTC
Permalink
Post by Michael Conrad
Hacker Hal is in a project with newbie Ned.
...
Post by Michael Conrad
One day, Hal opens up a file in an old copy of vi, and notices that there
are ^M all over the place. He screams and curses the day that Ned was born,
to no avail.
This story persuades me that darcs should leave line-endings untouched,
as it is not its place to be modifying such things.

If you want revision control, use darcs (or monotone, or ...). If you
want line-endings regularized, run a script which processes text files
and regularizes their line-endings. If you want tabs coverted into
sequences of four spaces, run a script that does that. Etc.

--Z

---
Please excuse terse writing -- there is a baby in my arms.
Michael Conrad
2004-10-31 02:40:25 UTC
Permalink
Post by Bryce Wilcox-O'Hearn
If you want revision control, use darcs (or monotone, or ...). If you
want line-endings regularized, run a script which processes text files
and regularizes their line-endings. If you want tabs coverted into
sequences of four spaces, run a script that does that. Etc.
The problem remains that the mess will still get recorded, and then need to
be fixed and a new patch generated, which will likely cause conflicts with
Ned's repo.

I had forgotten the classic issue of tabbing/spacing, though; thats another
annoying one. (The working files can be fixed with a script, but not the
patches)

What about the custom-diff idea? If I could somehow pass -b to diff, it
would solve the irritation: darcs would only see changes involving real
edits, and not whitespace finagling. I forget if this will solve the ^M
problem as well, but I could always hack up my own version of diff.

I'm thinking that darcs would remain consistent if -b were applied to ALL of
its calls to diff. Can anyone think of an exception?

-Mike
Mark Stosberg
2004-10-31 02:45:40 UTC
Permalink
Post by Michael Conrad
If I could somehow pass -b to diff
Go-go gadget documentation:
http://www.abridgegame.org/darcs/manual/node7.html#SECTION00781000000000000000

(Darcs does allow you to pass "-b" to diff, at least with darcs diff).

Sorry for for Inspector Gadget cartoon reference for those of you who missed
the show.

Mark
--
http://mark.stosberg.com/
Michael Conrad
2004-10-31 04:23:06 UTC
Permalink
Post by Michael Conrad
The problem remains that the mess will still get recorded, and then need to
be fixed and a new patch generated, which will likely cause conflicts with
Ned's repo.
I had forgotten the classic issue of tabbing/spacing, though; thats another
annoying one. (The working files can be fixed with a script, but not the
patches)
What about the custom-diff idea? If I could somehow pass -b to diff, it
would solve the irritation: darcs would only see changes involving real
edits, and not whitespace finagling. I forget if this will solve the ^M
problem as well, but I could always hack up my own version of diff.
I'm thinking that darcs would remain consistent if -b were applied to ALL of
its calls to diff. Can anyone think of an exception?
In order to give this a test, and not have to spend time learning haskell
(which I'll do one of these days, really!) I figured I'd just force diff to
always have the -b option, the direct way.
So I tried:

$ vi darcsdiff.c
(trimmed slightly for email)
#include <unistd.h>
int main(int argc, char** argv) {
int i;
char** newArgs= (char**) calloc(argc+2, sizeof(char*));
newArgs[0]= argv[0];
newArgs[1]= "-b";
for (i=1; i<=argc; i++)
newArgs[i+1]= argv[i];
execv("/usr/bin/truediff", newArgs);
}
$ gcc -o darcsdiff darcsdiff.c
# mv /usr/bin/diff /usr/bin/truediff
# mv darcsdiff /usr/bin/diff
$ mkdir temp && cd temp && darcs init
$ vi blah
12345
$ darcs record
$ vi blah
12345

And now, "diff -r --exclude=_darcs . _darcs/current" shows nothing, but
"darcs whatsnew" shows the change as usual. So isn't darcs using diff to
detect changes?

-Mike
David Roundy
2004-10-31 11:12:00 UTC
Permalink
Post by Michael Conrad
And now, "diff -r --exclude=_darcs . _darcs/current" shows nothing, but
"darcs whatsnew" shows the change as usual. So isn't darcs using diff to
detect changes?
No, darcs uses its own code to work out changes.
--
David Roundy
http://www.abridgegame.org
Quag
2004-10-31 09:13:16 UTC
Permalink
On Sun, 31 Oct 2004 00:23:06 -0400, Michael Conrad
<***@email.uc.edu> wrote:

[big-snip]
Post by Michael Conrad
And now, "diff -r --exclude=_darcs . _darcs/current" shows nothing, but
"darcs whatsnew" shows the change as usual. So isn't darcs using diff to
detect changes?
Bingo.

I think diff is only used for the output of the "darcs diff" command.
All other diffs are performed with darcs own diffing implementation
and diff file format.

Jonathan.

Nigel Rowe
2004-10-31 08:21:01 UTC
Permalink
Post by Michael Conrad
Post by Bryce Wilcox-O'Hearn
If you want revision control, use darcs (or monotone, or ...). If you
want line-endings regularized, run a script which processes text files
and regularizes their line-endings. If you want tabs coverted into
sequences of four spaces, run a script that does that. Etc.
I see the problem as more in the way of "what is the definition of a line of
text" than regularising the contents of the lines themselves.
Post by Michael Conrad
The problem remains that the mess will still get recorded, and then need to
be fixed and a new patch generated, which will likely cause conflicts with
Ned's repo.
I had forgotten the classic issue of tabbing/spacing, though; thats another
annoying one. (The working files can be fixed with a script, but not the
patches)
Tabs/spaces are the part of contents of the contents of the line. \n vs \r\n
vs \r define what separates lines within a *file*. And that is system
dependent.

The basic unit of a darcs hunk is a line of text. I think the definition of
the representation of a line of text, within a file under darcs control,
needs to be regularised.
Post by Michael Conrad
What about the custom-diff idea? If I could somehow pass -b to diff, it
would solve the irritation: darcs would only see changes involving real
edits, and not whitespace finagling. I forget if this will solve the ^M
problem as well, but I could always hack up my own version of diff.
I'm thinking that darcs would remain consistent if -b were applied to ALL
of its calls to diff. Can anyone think of an exception?
Python. Leading whitespace is *very* significant. For that matter, in an
open triple-quote, trailing whitespace forms part of the literal.
Post by Michael Conrad
-Mike
Cheers,
Nigel


- --
Nigel Rowe
***@swiftdsl.com.au
Taral
2004-10-29 17:07:13 UTC
Permalink
Post by Michael Conrad
2: "ASCII Text" is a protocol which different systems speak differently. It
consists of lines of printable characters which are delimited by
system-specific control characters. A revision control system should use
whatever notion of "text" the host operating system uses, and translate
whenever data comes or goes from the system.
As long as darcs uses line-based patches, it really should allow the
definition of "line" to vary by platform.

Put in a binary diffing system that works and I won't care.
--
Taral <***@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Martin Schaffner
2004-10-29 18:37:18 UTC
Permalink
Post by Michael Conrad
Post by Michael Conrad
1: All spacing and formatting and control characters are part of the
file,
Post by Michael Conrad
and should not be modified by a revision control program. If it were
modified, it could end up with inconsistent results when copying
between
Post by Michael Conrad
systems and diffing by hand.
definitely. I give darcs my data and I expect it to just take it and
store it.
I second that. Please keep darcs simple! There are other tools for line
conversion. Sometimes, I push to a Windows filesystem mounted on my Mac
- should darcs convert in this case or not? What if I want to try out
other changes on the Windows mount before pulling them?
I don't really care if there's a --convert-newlines option, as long as
per default, darcs does not mangle my files.
People who want this option always switched on can then throw it into
_darcs/prefs/defaults.
Post by Michael Conrad
Post by Michael Conrad
2: "ASCII Text" is a protocol which different systems speak
differently. It
Post by Michael Conrad
consists of lines of printable characters which are delimited by
system-specific control characters.
Most developer tools will allow for the unixy '\n' line ending, so it
should be easy to standardize on them.
Post by Michael Conrad
As long as darcs uses line-based patches, it really should allow the
definition of "line" to vary by platform.
Put in a binary diffing system that works and I won't care.
This would not help, as "\n" and "\r\n" are also different if
interpreted as binary, so you'd also get useless differences...
Dustin Sallings
2004-10-29 19:39:01 UTC
Permalink
Sometimes, I push to a Windows filesystem mounted on my Mac - should
darcs convert in this case or not?
I think this is a very good point. Because darcs does not distinguish
repositories and working dirs, one's working dir isn't necessarily the
result of a specific checkout. Given that, it might make more sense to
have a separate conversion mechanism, and have the patch format only
deal with lines (regardless of endings) and store with consistent line
endings.

Then again, I only develop in UNIX, so my opinion may not be very
important.
--
Dustin Sallings
Martin Schaffner
2004-10-29 19:55:52 UTC
Permalink
it might make more sense to have a separate conversion mechanism, and
have the patch format only deal with lines (regardless of endings) and
store with consistent line endings.
This idea assumes that files contain consistent lines to begin with -
which is not necessarily the case.
IMO it is problematic to have patches in the repo for which darcs does
not know how to apply them - with line conversion or without? What will
check/repair do? Will this depend on the prefs?

What speaks against always using '\n'? Are there editors on Windows
that do not understand the '\n'-ending (besides Notepad and edit.exe)?
Then again, I only develop in UNIX, so my opinion may not be very
important.
I use UNIX, Mac, and Windows, and I prefer to be able to understand
what happens if something goes wrong, instead of using automagic (and
sometimes wrong-guessing) complex mechanisms.

Martin
Mark Stosberg
2004-10-29 20:53:37 UTC
Permalink
Post by Martin Schaffner
What speaks against always using '\n'? Are there editors on Windows
that do not understand the '\n'-ending (besides Notepad and edit.exe)?
Here's my perspective as former Mac OS/Unix user (before Mac /became/
unix).

I used BBEdit as my editor. I had it set to "auto convert line endings"
when I opened them. I also had it set to save files with Unix line
breaks by default.

The result was that I could seamlessly open, view, edit and save Unix
files without a problem.

Now I work purely in Unixland, and don't have a strong opinion of #1 or
#2.

Mark
Will
2004-10-31 01:30:08 UTC
Permalink
Post by Martin Schaffner
Post by Sean Perry
Post by Michael Conrad
1: All spacing and formatting and control characters are part of
the file, and should not be modified by a revision control
program. If it were modified, it could end up with inconsistent
results when copying between systems and diffing by hand.
definitely. I give darcs my data and I expect it to just take it and
store it.
I second that. Please keep darcs simple! There are other tools for
line conversion. Sometimes, I push to a Windows filesystem mounted on
my Mac - should darcs convert in this case or not? What if I want to
try out other changes on the Windows mount before pulling them?
I don't really care if there's a --convert-newlines option, as long as
per default, darcs does not mangle my files.
People who want this option always switched on can then throw it into
_darcs/prefs/defaults.
I'll throw my preference in with this sub-thread, CVS's automatic
conversion frequently trips me on win32 builds of unixy tools such as
emacs and I end up re-getting the repository with -kb.

I have yet to run into a case where I want newlines converted from
what they were at check-in, and if there ever is such a case I'll use
emacs or unix2dos.

Regards,
Will
Michael Conrad
2004-10-31 19:49:19 UTC
Permalink
Post by Will
Post by Martin Schaffner
Post by Sean Perry
Post by Michael Conrad
1: All spacing and formatting and control characters are part of
the file, and should not be modified by a revision control
program. If it were modified, it could end up with inconsistent
results when copying between systems and diffing by hand.
definitely. I give darcs my data and I expect it to just take it and
store it.
I second that. Please keep darcs simple! There are other tools for
line conversion. Sometimes, I push to a Windows filesystem mounted on
my Mac - should darcs convert in this case or not? What if I want to
try out other changes on the Windows mount before pulling them?
I don't really care if there's a --convert-newlines option, as long as
per default, darcs does not mangle my files.
People who want this option always switched on can then throw it into
_darcs/prefs/defaults.
I have yet to run into a case where I want newlines converted from
what they were at check-in, and if there ever is such a case I'll use
emacs or unix2dos.
The issue wasn't about getting the checked-out line endings the way I want
them, but about fixing OTHER's line endings BEFORE checkin. My story of Hal
& Ned was a simplified version of a school project I did where I had 2 team
members who were exclusively Windows users, and had never used version
control. I managed to convince them that version control was necessary, and
got them to install WinCVS. Of course, not too long down the road I realize
that all their checkins have ^M all over the place. I stripped these out,
but then it created a lot of extra "patch info".

The heart of the issue is that they added "logical lines" to the project,
but CVS added line encodings. If darcs works with "logical lines"
internally, then everyone can have them written locally using the encoding
of their choice.

-Mike
Tomasz Zielonka
2004-10-30 06:45:33 UTC
Permalink
Post by Michael Conrad
So, whats everyone's opinion?
I don't care what it does on windows or mac, as long as it stays the
same on unix, or at least can be configured to behave the same way.

Best regards,
Tom
--
.signature: Too many levels of symbolic links
Martin Schaffner
2004-10-30 19:22:32 UTC
Permalink
Post by Samuel A. Falvo II
Post by Samuel A. Falvo II
This means that darcs would have to be able to somehow detect or
store as
Post by Samuel A. Falvo II
metadata which kind of system is in use when a file is initially
checked
Post by Samuel A. Falvo II
in, and remember to convert to and from that format as appropriate.
Woah- that wasn't quite what I had in mind. I was thinking along the
lines
of having a compiled instance of darcs with a
"platform-default-line-encoding" in it. This default would be
determined at
compile time,
This would mean that the version of darcs alone would not tell you what
behaviour it shows.
If a "default line ending" feature goes into darcs, then it should be
per-repo and not per-darcs-executable
Post by Samuel A. Falvo II
Any patch applied using that copy of darcs would receive the platform's
encoding, and all recordings would be written with the standard repo
encoding.
Post by Samuel A. Falvo II
In the interest of making things easier for darcs, I suggest having a
*common* form in which *all* text files take inside the repository,
regardless of which platform darcs is running on. This will make
transferring repositorities between platforms easier. I suggest the
text transmission layout as used by RFC822, for example. Basically,
this is DOS-style, and is the original style recommended by ASCII
(heh,
Post by Samuel A. Falvo II
the ONLY standard Microsoft has adopted, it seems). That is, lines
end
Post by Samuel A. Falvo II
with CR/LF.
So I guess you were thinking the same thing, kind of.
I would vote for \n unix lines, since it is the way things already are.
( and has the bonus of saving one character per line ;-] )
Standards are nice, but holy wars speak louder :-)
Or the "standard line encoding" could be stored in _darcs/prefs/

OTOH, what would happen if darcs mistakes a binary file for a text
file? In this case the line ending conversion would effectively corrupt
the repo.
I'm still for the KISS principle. Let's not add bloat to darcs!
Post by Samuel A. Falvo II
So, the easiest solution would be for windows versions of darcs to
automatically strip \r. Another solution is to make Ned use an editor
capable of unix newlines, and get Ned to turn on that feature, which
might
have to be enabled on a per-project basis. I think we all know the
futility
of the second option. Its usually effort-enough just to get Ned to use
version control.
In the case that you have a samba-mounted home directory, you could of
course manually add the pref for line endings, since this is something
that
you deal with and understand.
What would the correct line ending be for a repo that's on a drive that
you mount from Unix as well as from Windows?

Martin
Michael Conrad
2004-10-30 22:16:31 UTC
Permalink
Post by Martin Schaffner
This default would be determined at compile time,
This would mean that the version of darcs alone would not tell you what
behaviour it shows.
If a "default line ending" feature goes into darcs, then it should be
per-repo and not per-darcs-executable
I think it should be a property of the repo instance, but determined by the
executable/global pref at the time the repo is initialized.
It makes sense to me that Windows exes would default to windows lines on any
repo that the user creates/gets. Likewise, a unix user should get unix line
endings by default. I also think this would be easy to set in the
.configure script. If you want to override it, then set the global pref.
Post by Martin Schaffner
I would vote for \n unix lines, since it is the way things already are.
( and has the bonus of saving one character per line ;-] )
Standards are nice, but holy wars speak louder :-)
Or the "standard line encoding" could be stored in _darcs/prefs/
I think the patch encoding should be the same accross all repos, so that
people can get and pull eachother's data without worrying about repo type.
I think the encoding of the working copy should be platform specific, to
best interact with the tools you use on the working files.
Post by Martin Schaffner
OTOH, what would happen if darcs mistakes a binary file for a text
file? In this case the line ending conversion would effectively corrupt
the repo.
Mis-recognition of binaries is a problem no matter what. The sooner you
find the mistake, the better.
Post by Martin Schaffner
I'm still for the KISS principle. Let's not add bloat to darcs!
It'd be a rather small change. One new global pref, one new repo-specific
pref, 3 possible values, and simple changes to the routines that read or
write lines. In Java, it would be as simple as subclassing BufferedReader
and PrintWriter.
Post by Martin Schaffner
What would the correct line ending be for a repo that's on a drive that
you mount from Unix as well as from Windows?
The pref found in the repo. It would default to something based on which
executable you used to get/create the repo, or the pref in your home
directory. If you don't like the default, override it.

Essentially, I want this to be just like the email-addr repo-property. Its
per-repo, but you can declare a default for new repos. The difference is
that the email-addr property prompts for the initial value, where I think
the default line encoding should be choosen by platform. This also makes it
totally transparent to unix-only users.

-Mike
Igor Bukanov
2004-10-31 12:06:09 UTC
Permalink
Martin Schaffner wrote:
...
...
Post by Martin Schaffner
Post by Michael Conrad
So, the easiest solution would be for windows versions of darcs to
automatically strip \r. Another solution is to make Ned use an editor
capable of unix newlines, and get Ned to turn on that feature, which
might
have to be enabled on a per-project basis. I think we all know the
futility
of the second option. Its usually effort-enough just to get Ned to use
version control.
In the case that you have a samba-mounted home directory, you could of
course manually add the pref for line endings, since this is something
that
you deal with and understand.
What would the correct line ending be for a repo that's on a drive that
you mount from Unix as well as from Windows?
As it was already pointed out, there are 2 cases to consider:

1) New text goes to the _darcs repository. Ideally I would like darcs to
accept all 3 line endings, \r\n, \n and \r even if they are mixed in the
single text file. That is darcs should split the text file into lines
and work with logical lines and not their physical presentation.
Internally the lines can be separated by \n but that is OK since if
somebody wishes to edit stored patch files, then the person can deal
with line ending there as well.

In that way it would not matter in which format data stored in the
working directory and it is not an issue even if the drive is accessible
by different OS with different tools.

2) Darcs generates text output from the context of _darcs repository. In
this case it would nice IMO if there is a flag to specify a particular
line ending for the working files and patches which defaults to platform
specific line ending or existing file context. That is, for new files or
output to stdout darcs should use this flag while for the existing files
the present line endings should be preserved.

Again the case of accessing the working directory from different OS
would not matter. Yes, the working directory may end up with text files
having different line separators but given 1) that would not confuse
darcs and to avoid confusing with tools (like notepad on Windows ;) the
explicit line ending for the output can be specified.

Regards, Igor
Michael Conrad
2004-10-31 19:24:32 UTC
Permalink
Post by Igor Bukanov
Post by Martin Schaffner
What would the correct line ending be for a repo that's on a
drive that you mount from Unix as well as from Windows?
1) New text goes to the _darcs repository. Ideally I would like darcs to
accept all 3 line endings, \r\n, \n and \r even if they are mixed in the
single text file. That is darcs should split the text file into lines
and work with logical lines and not their physical presentation.
Internally the lines can be separated by \n but that is OK since if
somebody wishes to edit stored patch files, then the person can deal
with line ending there as well.
I'll second that. The only catch would be if someone needed to have a \r in
the middle of a line, although I can't conceive of a situation where that
would be desirable. (additionally, it would only be possible on UNIX, and
darcs seems to default to settings that are platform-independant).
Post by Igor Bukanov
2) Darcs generates text output from the context of _darcs repository. In
this case it would nice IMO if there is a flag to specify a particular
line ending for the working files and patches which defaults to platform
specific line ending or existing file context. That is, for new files or
output to stdout darcs should use this flag while for the existing files
the present line endings should be preserved.
I'm imagining that darcs would read the file into logical lines, apply the
patches, then write them back out using encoding X.

If you wanted to preserve the encoding, you'd have to analyze the lines as
you read them to figure out what encoding they use. Then remember that flag
for when you write it again.

I'd almost rather it just always write using the repo-specified encoding,
since its less code to write. But, I guess I wouldn't mind either way.

-Mike
Martin Schaffner
2004-10-31 18:24:27 UTC
Permalink
Post by Michael Conrad
Post by Martin Schaffner
This default would be determined at compile time,
This would mean that the version of darcs alone would not tell you
what
Post by Martin Schaffner
behaviour it shows.
If a "default line ending" feature goes into darcs, then it should be
per-repo and not per-darcs-executable
I think it should be a property of the repo instance, but determined
by the
executable/global pref at the time the repo is initialized.
It makes sense to me that Windows exes would default to windows lines
on any
repo that the user creates/gets. Likewise, a unix user should get
unix line
endings by default. I also think this would be easy to set in the
.configure script. If you want to override it, then set the global
pref.
Post by Martin Schaffner
Or the "standard line encoding" could be stored in _darcs/prefs/
I think the patch encoding should be the same accross all repos, so
that
people can get and pull eachother's data without worrying about repo
type.
I think the encoding of the working copy should be platform specific,
to
best interact with the tools you use on the working files.
If I get this right, you suggest there are two line endings in use: The
patches' line endings (for which you propose '\n'), and the repo line
endings (what is in the source tree one edits).
This leaves some questions:
* What line endings should _darcs/current have?
* If I switch the repo-line-ending, does darcs have to switch all line
endings, or will "darcs check" tell me everything is not allright?
Post by Michael Conrad
Post by Martin Schaffner
OTOH, what would happen if darcs mistakes a binary file for a text
file? In this case the line ending conversion would effectively
corrupt
Post by Martin Schaffner
the repo.
Mis-recognition of binaries is a problem no matter what. The sooner
you
find the mistake, the better.
Currently, darcs will give me everything back in the state I gave it to
it. I'd like that this stays true.
Post by Michael Conrad
Post by Martin Schaffner
I'm still for the KISS principle. Let's not add bloat to darcs!
It'd be a rather small change. One new global pref, one new
repo-specific
pref, 3 possible values, and simple changes to the routines that read
or
write lines. In Java, it would be as simple as subclassing
BufferedReader
and PrintWriter.
The implementation might not be that huge, but it makes the UI more
complex as well. I'd have to worry about three line endings: The global
pref (I take it you're speaking about the setting in the home dir, and
not the patches' line ending), the repo-specific pref, and the line
ending of my editor. And the additional cost of not being sure that
darcs gives me the data back in the state I gave it to darcs.
Post by Michael Conrad
Post by Martin Schaffner
What would the correct line ending be for a repo that's on a drive
that
Post by Martin Schaffner
you mount from Unix as well as from Windows?
The pref found in the repo. It would default to something based on
which
executable you used to get/create the repo, or the pref in your home
directory. If you don't like the default, override it.
So this is one more thing I'll have to worry about: On which machine
should I create the repo, or if I'm on the "wrong" machine, I'll have
to change the pref (and do "darcs repair"?)
Post by Michael Conrad
Essentially, I want this to be just like the email-addr repo-property.
Its
per-repo, but you can declare a default for new repos. The difference
is
that the email-addr property prompts for the initial value, where I
think
the default line encoding should be choosen by platform. This also
makes it
totally transparent to unix-only users.
Except if they get the idea to "cp -r" a repo from a Windows machine,
for example because it contains unrecorded changes...

--
Martin
Michael Conrad
2004-10-31 20:19:25 UTC
Permalink
Post by Michael Conrad
Post by Martin Schaffner
This default would be determined at compile time,
This would mean that the version of darcs alone would not tell
you what behaviour it shows.
If a "default line ending" feature goes into darcs, then it
should be per-repo and not per-darcs-executable
I think it should be a property of the repo instance, but
determined by the executable/global pref at the time the repo
is initialized. It makes sense to me that Windows exes would
default to windows lines on any repo that the user creates/gets.
Likewise, a unix user should get unix line endings by default.
I also think this would be easy to set in the .configure script.
If you want to override it, then set the global pref.
Post by Martin Schaffner
Or the "standard line encoding" could be stored in _darcs/prefs/
I think the patch encoding should be the same accross all repos,
so that people can get and pull eachother's data without worrying
about repo type.
I think the encoding of the working copy should be platform
specific, to best interact with the tools you use on the working
files.
The patches' line endings (for which you propose '\n'), and the repo
line endings (what is in the source tree one edits).
* What line endings should _darcs/current have?
* If I switch the repo-line-ending, does darcs have to switch all
line endings, or will "darcs check" tell me everything is not allright?
First, see Igor's post. I like his idea.
Then, _darcs/current wouldn't matter, but they might as well be in
user-specified encoding, so the user can play with them with minimal fuss.
Except if they get the idea to "cp -r" a repo from a Windows
machine, for example because it contains unrecorded changes...
Also solved by Igor's idea.
The implementation might not be that huge, but it makes the UI more
complex as well. I'd have to worry about three line endings: The
global pref (I take it you're speaking about the setting in the home
dir, and not the patches' line ending), the repo-specific pref, and
the line ending of my editor. And the additional cost of not being
sure that darcs gives me the data back in the state I gave it to darcs.
Post by Michael Conrad
Post by Martin Schaffner
What would the correct line ending be for a repo that's on a
drive that you mount from Unix as well as from Windows?
So this is one more thing I'll have to worry about: On which machine
should I create the repo, or if I'm on the "wrong" machine, I'll have
to change the pref (and do "darcs repair"?)
If the default is platform-native, and you only have one platform, there's
nothing to worry about, and considerable benefit if project members use
different platforms.

If you use multi-platform arrangements, you usually have to standardize
anyway. I can't see why someone would want to have text files with mixed
encodings. On my setup (Win2K dual-head with Gentoo single-head and FreeBSD
fileserver with numerous shared filesystems) I gave up on managing the
encodings and standardized on \n.

And just in case there's any confusion, I'm saying that EACH COPY of the
repo would have a encoding preference. If you create the repo with the
wrong default, just pull it into a repo with the right default. And, with
Igor's idea, you could change the default of an existing repo without any
adverse effects. (though it wouldn't automatically change the encoding of
the existing working files)

-Mike
Loading...