[Gluster-users] gluster 3.2.0 - totally broken?

Discussion:

[Gluster-users] gluster 3.2.0 - totally broken?

Udo Waechter

2011-05-18 12:45:19 UTC

Hi there,
after reporting some trouble with group access permissions,
http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which
still persist, btw.)

things get worse and worse with each day.

Now, we see a lot of duplicate files (again, only fuse-clients here),
access permissions are reset on a random and totally annoying basis.
Files are empty from time to time and become:
-rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x 1 user2 group2 531 2011-03-03 10:47 result_11.mat
------S--T 1 root group2 0 2011-04-14 07:57 result_11.mat
-rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt
-rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt

where group2 are secondary groups.

How come that there are these empty and duplicate files? Again, this
listing is from the fuse-mount

Could it be that version 3.2.0 is totally borked?

Btw.:

Stephan von Krawczynski

2011-05-18 13:44:48 UTC

On Wed, 18 May 2011 14:45:19 +0200

Post by Udo Waechter
Hi there,
after reporting some trouble with group access permissions,
http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which
still persist, btw.)
things get worse and worse with each day.
[...]
Currently our only option seems to be to go away from glusterfs to some
other filesystem which would be a bitter decission.
Thanks for any help,
udo.

Hello Udo,

unfortunately I can only confirm your problems. The last known-to-work version
we see is 2.0.9. Everything beyond is just bogus.
3.X did not solve a single issue but brought quite a lot of new ones instead.
The project only gained featurism but did not solve the very basic problems.
Up to the current day there is no way to see a list of not-synced files on a
replication setup, that is ridiculous. I hope ever since 2.0.9 that someone
does a fork and really attacks the basics. IOW: good idea, pretty bad
implementation, no will to listen or learn.

Regards,
Stephan

Post by Udo Waechter
--
Institute of Cognitive Science - System Administration Team
Albrechtstrasse 28 - 49076 Osnabrueck - Germany
Tel: +49-541-969-3362 - Fax: +49-541-969-3361
https://doc.ikw.uni-osnabrueck.de
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Whit Blauvelt

2011-05-18 13:54:44 UTC

paul simpson

2011-05-18 14:05:27 UTC

hi guys,

we're using 3.1.3 and i'm not moving off it. i totally agree with stephans
comments: the gluster devs *need* to concentrate on stability before adding
any new features. it seems gluster dev is sales driven - not tech focused.
we need less new buzz words - and more solid foundations.

gluster is a great idea - but is in danger of falling short and failing if
the current trajectory is now altered. greater posix compatibility
(permissions, NLM locking) should be a perquisite for an NFS server. hell,
the documentation is terrible; it's hard for us users to contribute to the
community when we are groping around in the dark too.

question : is anyone using 3.2 in a real world production situation?

regards to all,

-paul

On 18 May 2011 14:54, Whit Blauvelt <***@transpect.com> wrote:

Anthony J. Biacco

2011-05-18 16:56:06 UTC

I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.

I 2nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.

I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I'd
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

-Tony

---------------------------

Manager, IT Operations

Format Dynamics, Inc.

P: 303-228-7327

F: 303-228-7305

***@formatdynamics.com

http://www.formatdynamics.com <http://www.formatdynamics.com/>

From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of paul simpson
Sent: Wednesday, May 18, 2011 8:05 AM
To: Whit Blauvelt
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

hi guys,

we're using 3.1.3 and i'm not moving off it. i totally agree with
stephans comments: the gluster devs *need* to concentrate on stability
before adding any new features. it seems gluster dev is sales driven -
not tech focused. we need less new buzz words - and more solid
foundations.

gluster is a great idea - but is in danger of falling short and failing
if the current trajectory is now altered. greater posix compatibility
(permissions, NLM locking) should be a perquisite for an NFS server.
hell, the documentation is terrible; it's hard for us users to
contribute to the community when we are groping around in the dark too.

question : is anyone using 3.2 in a real world production situation?

regards to all,

-paul

On 18 May 2011 14:54, Whit Blauvelt <***@transpect.com> wrote:

Udo Waechter

2011-05-18 17:00:30 UTC

Im actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if Id have any ill-effects on the volume with a simple rpm downgrade and daemon restart.

I read somewhere in the docs that you need to reset the volume option beforehand

gluster volume reset <volname>

good luck. Would be nice to hear if it worked for you.
--udo.

--
:: udo waechter - ***@zoide.net :: N 52º16'30.5" E 8º3'10.1"
:: genuine input for your ears: http://auriculabovinari.de
:: your eyes: http://ezag.zoide.net
:: your brain: http://zoide.net

Tomasz Chmielewski

2011-05-18 17:04:32 UTC

I’m using it in real-world production, lot of small files (apache
webroot mounts mostly). I’ve seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it’d be nice to see a list of what files/dirs
gluster thinks is out of sync or can’t heal. Right now you gotta go
diving into the logs.
I’m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I’d
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount
(described here on the list).

I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm,
add nodes) and it works fine for me.

3.0.x was also crashing for me when SSHFS-like mount was used to the
server with gluster mount (and reads/writes were made from the gluster
mount through it).

--
Tomasz Chmielewski
http://wpkg.org

Justice London

2011-05-18 19:48:14 UTC

I had issues with hanging of mounts as well with 3.2. I fixed it via upping
the number of connections allowed to the fuse mount... by default it's
something silly low like 16. I don't quite understand why the basic options
are not at least slightly optimized for more than a few connections. Try
this and see if it helps with 3.2. It helped me, but this is also a 3.1.x
fix as well...

Edit: /etc/glusterd/vols/<volname>/<volname>-fuse.vol

<Replicate and brick entries above here>

volume <volname>-write-behind
type performance/write-behind
option cache-size 4MB
option flush-behind on
subvolumes <volname>-replicate-0
end-volume

volume <volname>-read-ahead
type performance/read-ahead
option page-count 4
subvolumes <volname>-write-behind
end-volume

volume <volname>-io-cache
type performance/io-cache
option cache-size 768MB
option cache-timeout 1
subvolumes <volname>-read-ahead
end-volume

volume <volname>-quick-read
type performance/quick-read
option cache-timeout 1
option cache-size 768MB
option max-file-size 64kB
subvolumes <volname>-io-cache
end-volume

#volume sitestore-stat-prefetch
# type performance/stat-prefetch
# subvolumes <volname>-quick-read
#end-volume

volume <volname>
type debug/io-stats
option latency-measurement off
option count-fop-hits off
# subvolumes <volname>-stat-prefetch
subvolumes <volname>-quick-read
end-volume

Justice London

PLEASE NOTE: This message, including any attachments, may include
privileged, confidential and/or inside information. Any distribution or use
of this communication by anyone other than the intended recipient(s) is
strictly prohibited and may be unlawful. If you are not the intended
recipient, please notify the sender by replying to this message and then
delete it from your system.
-----Original Message-----
From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of Tomasz Chmielewski
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I'd
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount
(described here on the list).

I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm,
add nodes) and it works fine for me.

3.0.x was also crashing for me when SSHFS-like mount was used to the
server with gluster mount (and reads/writes were made from the gluster
mount through it).

--
Tomasz Chmielewski
http://wpkg.org

Tomasz Chmielewski

2011-05-18 20:12:49 UTC

Post by Justice London
I had issues with hanging of mounts as well with 3.2. I fixed it via upping
the number of connections allowed to the fuse mount... by default it's
something silly low like 16.

What option exactly is it, and where do I set it?

That could be it, as the mount point is accessed pretty heavily
(interesting why 3.1.4 doesn't show this behaviour though).

--
Tomasz Chmielewski
http://wpkg.org

Justice London

2011-05-18 19:49:12 UTC

Whoops, and forgot the threads edit for the brick instance config:

volume <volname>-io-threads
type performance/io-threads
option thread-count 64
subvolumes <volname>-locks
end-volume

Justice London
Systems Administrator

phone 800-397-3743 ext. 7005
fax 760-510-0299
web www.lawinfo.com
e-mail ***@lawinfo.com

PLEASE NOTE: This message, including any attachments, may include
privileged, confidential and/or inside information. Any distribution or use
of this communication by anyone other than the intended recipient(s) is
strictly prohibited and may be unlawful. If you are not the intended
recipient, please notify the sender by replying to this message and then
delete it from your system.

-----Original Message-----
From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of Tomasz Chmielewski
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I'd
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount
(described here on the list).

I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm,
add nodes) and it works fine for me.

3.0.x was also crashing for me when SSHFS-like mount was used to the
server with gluster mount (and reads/writes were made from the gluster
mount through it).

--
Tomasz Chmielewski
http://wpkg.org

Burnash, James

2011-05-18 19:57:28 UTC

I believe that it is more consistent and repeatable to just use the gluster command to set this. Example from this page: http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options

gluster volume set VOLNAME performance.io-thread-count 64

This also means that your changes will persist across any other "gluster volume set" commands. Generally speaking, hand editing the volume config files is a bad idea, IMHO.

James

-----Original Message-----
From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of Justice London
Sent: Wednesday, May 18, 2011 3:49 PM
To: 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Whoops, and forgot the threads edit for the brick instance config:

volume <volname>-io-threads
type performance/io-threads
option thread-count 64
subvolumes <volname>-locks
end-volume

Justice London
Systems Administrator

phone 800-397-3743 ext. 7005
fax 760-510-0299
web www.lawinfo.com
e-mail ***@lawinfo.com

PLEASE NOTE: This message, including any attachments, may include privileged, confidential and/or inside information. Any distribution or use of this communication by anyone other than the intended recipient(s) is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify the sender by replying to this message and then delete it from your system.

-----Original Message-----
From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of Tomasz Chmielewski
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if
I'd have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace programs "hanging" on accessing some files on the gluster mount (described here on the list).

I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm, add nodes) and it works fine for me.

3.0.x was also crashing for me when SSHFS-like mount was used to the server with gluster mount (and reads/writes were made from the gluster mount through it).

--
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com

Justice London

2011-05-18 20:08:24 UTC

Ah, well I was actually looking around to see if with 3.2 there was a
command to set the performance options... answers that question. Either way,
though, I think setting the threads will help a lot.
Justice London

-----Original Message-----
From: Burnash, James [mailto:***@knight.com]
Sent: Wednesday, May 18, 2011 12:57 PM
To: 'Justice London'; 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Cc: gluster-***@gluster.org
Subject: RE: [Gluster-users] gluster 3.2.0 - totally broken?

I believe that it is more consistent and repeatable to just use the gluster
command to set this. Example from this page:
http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Settin
g_Volume_Options

gluster volume set VOLNAME performance.io-thread-count 64

This also means that your changes will persist across any other "gluster
volume set" commands. Generally speaking, hand editing the volume config
files is a bad idea, IMHO.

James

-----Original Message-----
From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of Justice London
Sent: Wednesday, May 18, 2011 3:49 PM
To: 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Whoops, and forgot the threads edit for the brick instance config:

volume <volname>-io-threads
type performance/io-threads
option thread-count 64
subvolumes <volname>-locks
end-volume

Justice London
Systems Administrator

phone 800-397-3743 ext. 7005
fax 760-510-0299
web www.lawinfo.com
e-mail ***@lawinfo.com

PLEASE NOTE: This message, including any attachments, may include
privileged, confidential and/or inside information. Any distribution or use
of this communication by anyone other than the intended recipient(s) is
strictly prohibited and may be unlawful. If you are not the intended
recipient, please notify the sender by replying to this message and then
delete it from your system.

-----Original Message-----
From: gluster-users-***@gluster.org
[mailto:gluster-users-***@gluster.org] On Behalf Of Tomasz Chmielewski
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if
I'd have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount (described
here on the list).

I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm, add
nodes) and it works fine for me.

3.0.x was also crashing for me when SSHFS-like mount was used to the server
with gluster mount (and reads/writes were made from the gluster mount
through it).

--
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or
copying of this e-mail, and any attachments thereto, is strictly prohibited.
If you have received this in error, please immediately notify me and
permanently delete the original and any copy of any e-mail and any printout
thereof. E-mail transmission cannot be guaranteed to be secure or
error-free. The sender therefore does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at
its discretion, monitor and review the content of all e-mail communications.
http://www.knight.com

Mohit Anchlia

2011-05-18 20:14:07 UTC

Running that command will set that option only on the server side. But
it looks like you want it on the client volume file for which there
currently is not any command.

Post by Justice London
Ah, well I was actually looking around to see if with 3.2 there was a
command to set the performance options... answers that question. Either way,
though, I think setting the threads will help a lot.
Justice London
-----Original Message-----
Sent: Wednesday, May 18, 2011 12:57 PM
To: 'Justice London'; 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Subject: RE: [Gluster-users] gluster 3.2.0 - totally broken?
I believe that it is more consistent and repeatable to just use the gluster
http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Settin
g_Volume_Options
gluster volume set VOLNAME performance.io-thread-count 64
This also means that your changes will persist across any other "gluster
volume set" commands. Generally speaking, hand editing the volume config
files is a bad idea, IMHO.
James
-----Original Message-----
Sent: Wednesday, May 18, 2011 3:49 PM
To: 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?
volume <volname>-io-threads
type performance/io-threads
option thread-count 64
subvolumes <volname>-locks
end-volume
Justice London
Systems Administrator
phone 800-397-3743 ext. 7005
fax 760-510-0299
web www.lawinfo.com
PLEASE NOTE: This message, including any attachments, may include
privileged, confidential and/or inside information. Any distribution or use
of this communication by anyone other than the intended recipient(s) is
strictly prohibited and may be unlawful. If you are not the intended
recipient, please notify the sender by replying to this message and then
delete it from your system.
-----Original Message-----
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if
I'd have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount (described
here on the list).
I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm, add
nodes) and it works fine for me.
3.0.x was also crashing for me when SSHFS-like mount was used to the server
with gluster mount (and reads/writes were made from the gluster mount
through it).
--
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or
copying of this e-mail, and any attachments thereto, is strictly prohibited.
If you have received this in error, please immediately notify me and
permanently delete the original and any copy of any e-mail and any printout
thereof. E-mail transmission cannot be guaranteed to be secure or
error-free. The sender therefore does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at
its discretion, monitor and review the content of all e-mail communications.
http://www.knight.com
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Anthony J. Biacco

2011-05-18 20:19:11 UTC

I never really understood that. That if you set a volume option with the command line, if you then had to put it in all the client mount files too and remount it?

-Tony
---------------------------
Manager, IT Operations
Format Dynamics, Inc.
P: 303-228-7327
F: 303-228-7305
***@formatdynamics.com
http://www.formatdynamics.com

Post by Burnash, James
-----Original Message-----
Sent: Wednesday, May 18, 2011 2:14 PM
To: Justice London
Cc: Burnash, James; Tomasz Chmielewski; Anthony J. Biacco; gluster-
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?
Running that command will set that option only on the server side. But
it looks like you want it on the client volume file for which there
currently is not any command.

Post by Justice London
Ah, well I was actually looking around to see if with 3.2 there was a
command to set the performance options... answers that question. Either

way,

Post by Justice London
though, I think setting the threads will help a lot.
Justice London
-----Original Message-----
Sent: Wednesday, May 18, 2011 12:57 PM
To: 'Justice London'; 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Subject: RE: [Gluster-users] gluster 3.2.0 - totally broken?
I believe that it is more consistent and repeatable to just use the gluster

http://www.gluster.com/community/documentation/index.php/Gluster_3.
1:_Settin

Post by Justice London
g_Volume_Options
gluster volume set VOLNAME performance.io-thread-count 64
This also means that your changes will persist across any other "gluster
volume set" commands. Generally speaking, hand editing the volume

config

Post by Justice London
files is a bad idea, IMHO.
James
-----Original Message-----
Sent: Wednesday, May 18, 2011 3:49 PM
To: 'Tomasz Chmielewski'; 'Anthony J. Biacco'
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?
volume <volname>-io-threads
type performance/io-threads
option thread-count 64
subvolumes <volname>-locks
end-volume
Justice London
Systems Administrator
phone 800-397-3743 ext. 7005
fax 760-510-0299
web www.lawinfo.com
PLEASE NOTE: This message, including any attachments, may include
privileged, confidential and/or inside information. Any distribution or use
of this communication by anyone other than the intended recipient(s) is
strictly prohibited and may be unlawful. If you are not the intended
recipient, please notify the sender by replying to this message and then
delete it from your system.
-----Original Message-----

Chmielewski

Post by Justice London
Sent: Wednesday, May 18, 2011 10:05 AM
To: Anthony J. Biacco
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Anthony J. Biacco
I'm using it in real-world production, lot of small files (apache
webroot mounts mostly). I've seen a bunch of split-brain and self-heal
failing when I first did the switch. After I removed and recreated the
dirs it seemed to be fine for about a week now; yeah not long, I know.
I 2^nd the notion that it'd be nice to see a list of what files/dirs
gluster thinks is out of sync or can't heal. Right now you gotta go
diving into the logs.
I'm actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if
I'd have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I've been using 3.2.0 for a while, but I had a problem with userspace
programs "hanging" on accessing some files on the gluster mount

(described

Post by Justice London
here on the list).
I downgraded to 3.1.4 (remove 3.2.0 rpm and config, install 3.1.4 rpm, add
nodes) and it works fine for me.
3.0.x was also crashing for me when SSHFS-like mount was used to the

server

Post by Justice London
with gluster mount (and reads/writes were made from the gluster mount
through it).
--
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or
copying of this e-mail, and any attachments thereto, is strictly prohibited.
If you have received this in error, please immediately notify me and
permanently delete the original and any copy of any e-mail and any

printout

Post by Justice London
thereof. E-mail transmission cannot be guaranteed to be secure or
error-free. The sender therefore does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group

may, at

Post by Justice London
its discretion, monitor and review the content of all e-mail communications.
http://www.knight.com
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Tomasz Chmielewski

2011-05-18 20:09:57 UTC

When you say you removed the config before and added the nodes after, do
you mean you deleted the volume and recreated it?

Yes.

I think I tried to do this without removing the config first, but 3.1.4
was complaining upon startup. Also, I've read about config problems with
3.1.4 -> 3.2.0 updates, so I figured it would be the easiest for me to
totally remove 3.2.0 and its config and start from scratch with 3.1.4.

--
Tomasz Chmielewski
http://wpkg.org

Anand Babu Periasamy

2011-05-18 20:16:59 UTC

GlusterFS is completely free. Same versions released to the community are
used for commercial deployments too. Their issues gets higher priority
though. Code related to other proprietary software such as VMWare, AWS,
RightScale are kept proprietary.

We acknowledge that we have done a poor job when it comes to managing
community, documentation and bug tracking. While we improved a lot since 2.x
versions, I agree we are not there yet. We hired a lot of engineers to
specifically focus on testing and bug fixes recently. QA team is
growing steadily. Lab size has been doubled. New QA lead is joining us next
month. QA team will have closer interaction with the community moving
forward. We also appointed Dave Garnett from HP as VP product manager and
Vidya Sakar from Sun/Oracle as Engineering manager.

We fully understand the importance of community. Paid vs Non-paid should not
matter when it comes to quality of software. Intangible contributions from
the community are equally valuable to the success of GlusterFS project. We
have appointed John Mark Walker as community manager. We launched
community.gluster.org site recently. Starting next month, we will have
regular community sessions. Problems raised by the community will also get
prioritized.

We are redoing the documentation completely. New system will be based on Red
Hat's Publican. Documentation team too will closely work with the community.

*Criticisms are taken positively. So please don't hesitate.*
Thanks!
-ab

Post by paul simpson
hi guys,
we're using 3.1.3 and i'm not moving off it. i totally agree with stephans
comments: the gluster devs *need* to concentrate on stability before adding
any new features. it seems gluster dev is sales driven - not tech focused.
we need less new buzz words - and more solid foundations.
gluster is a great idea - but is in danger of falling short and failing if
the current trajectory is now altered. greater posix compatibility
(permissions, NLM locking) should be a perquisite for an NFS server. hell,
the documentation is terrible; it's hard for us users to contribute to the
community when we are groping around in the dark too.
question : is anyone using 3.2 in a real world production situation?
regards to all,
-paul

paul simpson

2011-05-18 23:23:04 UTC

great to know - this is very reassuring to hear! i know it's early days for
a file-system - and that fact so many people are using it so quickly (say,
as compared to BTRFS) is amazing. i think there's lots of goodwill here -
which can/will translate into a even more vibrant community. i look forward
to seeing these new developments roll out.

-paul

Post by Anand Babu Periasamy
GlusterFS is completely free. Same versions released to the community are
used for commercial deployments too. Their issues gets higher priority
though. Code related to other proprietary software such as VMWare, AWS,
RightScale are kept proprietary.
We acknowledge that we have done a poor job when it comes to managing
community, documentation and bug tracking. While we improved a lot since 2.x
versions, I agree we are not there yet. We hired a lot of engineers to
specifically focus on testing and bug fixes recently. QA team is
growing steadily. Lab size has been doubled. New QA lead is joining us next
month. QA team will have closer interaction with the community moving
forward. We also appointed Dave Garnett from HP as VP product manager and
Vidya Sakar from Sun/Oracle as Engineering manager.
We fully understand the importance of community. Paid vs Non-paid should
not matter when it comes to quality of software. Intangible contributions
from the community are equally valuable to the success of GlusterFS project.
We have appointed John Mark Walker as community manager. We launched
community.gluster.org site recently. Starting next month, we will have
regular community sessions. Problems raised by the community will also get
prioritized.
We are redoing the documentation completely. New system will be based on
Red Hat's Publican. Documentation team too will closely work with the
community.
*Criticisms are taken positively. So please don't hesitate.*
Thanks!
-ab

Post by paul simpson
hi guys,
we're using 3.1.3 and i'm not moving off it. i totally agree with
stephans comments: the gluster devs *need* to concentrate on stability
before adding any new features. it seems gluster dev is sales driven - not
tech focused. we need less new buzz words - and more solid foundations.
gluster is a great idea - but is in danger of falling short and failing if
the current trajectory is now altered. greater posix compatibility
(permissions, NLM locking) should be a perquisite for an NFS server. hell,
the documentation is terrible; it's hard for us users to contribute to the
community when we are groping around in the dark too.
question : is anyone using 3.2 in a real world production situation?
regards to all,
-paul

Stephan von Krawczynski

2011-05-20 09:15:18 UTC

On Wed, 18 May 2011 13:16:59 -0700

Post by Anand Babu Periasamy
GlusterFS is completely free. Same versions released to the community are
used for commercial deployments too. Their issues gets higher priority
though. Code related to other proprietary software such as VMWare, AWS,
RightScale are kept proprietary.
We acknowledge that we have done a poor job when it comes to managing
community, documentation and bug tracking. While we improved a lot since 2.x
versions, I agree we are not there yet. We hired a lot of engineers to
specifically focus on testing and bug fixes recently. QA team is
growing steadily. Lab size has been doubled. New QA lead is joining us next
month. QA team will have closer interaction with the community moving
forward. We also appointed Dave Garnett from HP as VP product manager and
Vidya Sakar from Sun/Oracle as Engineering manager.
We fully understand the importance of community. Paid vs Non-paid should not
matter when it comes to quality of software. Intangible contributions from
the community are equally valuable to the success of GlusterFS project. We
have appointed John Mark Walker as community manager. We launched
community.gluster.org site recently. Starting next month, we will have
regular community sessions. Problems raised by the community will also get
prioritized.
We are redoing the documentation completely. New system will be based on Red
Hat's Publican. Documentation team too will closely work with the community.
*Criticisms are taken positively. So please don't hesitate.*
Thanks!
-ab

Sorry, this clearly shows the problem: understanding.
It really does not help you a lot to hire a big number of people, you do not
fail in terms of business relation. Your problem is the _code_. You need a
filesystem expert. A _real_ one, not _some_ one. Like lets say Daniel
Phillips, Theodore "Ted" Ts'o or the like.

--
Regards,
Stephan

Jeff Darcy

2011-05-20 12:35:35 UTC

Sorry, this clearly shows the problem: understanding. It really does
not help you a lot to hire a big number of people, you do not fail in
terms of business relation. Your problem is the _code_. You need a
filesystem expert. A _real_ one, not _some_ one. Like lets say
Daniel Phillips, Theodore "Ted" Ts'o or the like.

I know both Daniel and Ted professionally. As a member of the largest
Linux filesystem group in the world, I am also privileged to work with
many other world-class filesystem experts. I also know the Gluster folks
quite well, and I can assure you that they have all the filesystem
expertise they need. They also have a *second* kind of expertise -
distributed systems - which is even more critical to this work and which
the vast majority of filesystem developers lack. What Gluster needs is
not more filesystem experts but more *other kinds* of experts as well as
non-experts and resources. The actions AB has mentioned are IMO exactly
those Gluster should be taking, and should be appreciated as such by any
knowledgeable observer.

Your flames are not only counter-productive but factually incorrect as
well. Please, if only for the sake of your own reputation, try to do
better.

Stephan von Krawczynski

2011-05-20 13:51:45 UTC

On Fri, 20 May 2011 08:35:35 -0400

Post by Jeff Darcy

Sorry, this clearly shows the problem: understanding. It really does
not help you a lot to hire a big number of people, you do not fail in
terms of business relation. Your problem is the _code_. You need a
filesystem expert. A _real_ one, not _some_ one. Like lets say
Daniel Phillips, Theodore "Ted" Ts'o or the like.

I know both Daniel and Ted professionally. As a member of the largest
Linux filesystem group in the world, I am also privileged to work with
many other world-class filesystem experts. I also know the Gluster folks
quite well, and I can assure you that they have all the filesystem
expertise they need. They also have a *second* kind of expertise -
distributed systems - which is even more critical to this work and which
the vast majority of filesystem developers lack. What Gluster needs is
not more filesystem experts but more *other kinds* of experts as well as
non-experts and resources. The actions AB has mentioned are IMO exactly
those Gluster should be taking, and should be appreciated as such by any
knowledgeable observer.
Your flames are not only counter-productive but factually incorrect as
well. Please, if only for the sake of your own reputation, try to do
better.

Forgive my ignorance Jeff, but it is obvious to anyone having used glusterfs
for months or years that the guys have a serious software design issue. If you
look at the "tuning" options configurable in glusterfs you should notice that
most of them are just an outcome of not being able to find a working i.e. best
solution for a problem. cache-timeout? thread-count? quick-read?
stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras
are bogus. When did you last tune the ext4 cache size or timeout? Don't come
up with ext4 being kernel vs. userspace fs. It was their decision to make it
userspace, so don't blame me. As a fs with networking it has to take the
comparison with nfs - as most interested users come from nfs. The first thing
they experience is that glusterfs is really slow compared to their old setups
with nfs. And the cause is _not_ replication per se. And as long as they
cannot cope with nfs performance my argument stands: they have a problem,be it
inferior per design or per coding.
As you see I am not talking at all about things that I count as basics in a
replication fs. I mean, really, I cannot express my feelings about the lack of
information for the admin around replication. Its pretty much like a wheel of
your car just fell off and you cannot find out which one. Would you trust that
car?
Let me clearly state this: the idea is quite brilliant, but the coding is at
the stage of a design study and could have been far better if they only
concentrated on the basics. If you want to build a house you don't buy the tv
set at first...

--
Regards,
Stephan

Jeff Darcy

2011-05-20 14:15:36 UTC

Post by Stephan von Krawczynski
Forgive my ignorance Jeff, but it is obvious to anyone having used glusterfs
for months or years that the guys have a serious software design issue.

No, it is not. I've been using and watching its development for years,
I know its code far better than you ever could, and I disagree. That's
one counterexample disproving "anyone" right there. Don't try to bluff me
with risible appeals to non-existent authority.

Post by Stephan von Krawczynski
If you
look at the "tuning" options configurable in glusterfs you should notice that
most of them are just an outcome of not being able to find a working i.e. best
solution for a problem. cache-timeout? thread-count? quick-read?
stat-prefetch?

Are you seriously saying that modularity and tuning parameters are bad?
Do you even know how many tuning options other filesystems such as ext4
or XFS have, or how many times they've iterated through different
internal algorithms to address various issues (especially scaling)? The
features you name are *all* configurable because some people need to
make different tradeoffs - often performance vs. resource consumption or
consistency - in their deployments. They can't just be "one size fits
all" values, and the Gluster developers are wise to allow this
flexibility.

I'm not going to engage you further on this Stephan, as long as you
demonstrate such complete ignorance of the issues involved and seem
interested in nothing but insulting those you should be thanking. If
GlusterFS
is so bad, go away. Good luck with the alternatives, which I know just
as well
and which are even more painful to deal with. When you're capable of
contributing constructively, your opinions will gain some weight.

Stephan von Krawczynski

2011-05-20 14:47:29 UTC

Jeff, let me give a final word on that.
I have no assets with this company and no other linux company, and that's why
I seem to gain the role of the bad boy pretty often in scenarios when people
loose track of the obvious. If it is not obvious to you that there are serious
problems both with stability and performance since long, well, then there is
not much more I can tell you. You even ignore simple facts like I did not
choose the above subject naming. What more can one say than "read what people
write about their problems with this software". It would be a shame if it all
ended up like the btrfs dinosaur.

55 om

--
Regards,
Stephan

Tomasz Chmielewski

2011-05-20 15:01:22 UTC

Post by Stephan von Krawczynski
most of them are just an outcome of not being able to find a working i.e. best
solution for a problem. cache-timeout? thread-count? quick-read?
stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras
are bogus. When did you last tune the ext4 cache size or timeout? Don't come
up with ext4 being kernel vs. userspace fs. It was their decision to make it
userspace, so don't blame me. As a fs with networking it has to take the
comparison with nfs - as most interested users come from nfs.

Ever heard of fsc (FS-Cache), acreg*, acdir*, actimeo options for NFS?

Yes, they are related to cache, and oh, NFS is kernelspace. And yes,
there are tunable timeout options for NFS as well.

As of timeout options with ext4, or any other local filesystem - if you
ever used iSCSI, you would also discover that it's recommended to set
reasonable timeout options there as well, depending on your network
infrastructure and usage/maintenance patterns. "Incidentally", iSCSI is
also kernelspace.

--
Tomasz Chmielewski
http://wpkg.org

Stephan von Krawczynski

2011-05-21 08:33:22 UTC

On Fri, 20 May 2011 17:01:22 +0200

Post by Tomasz Chmielewski

Post by Stephan von Krawczynski
most of them are just an outcome of not being able to find a working i.e. best
solution for a problem. cache-timeout? thread-count? quick-read?
stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras
are bogus. When did you last tune the ext4 cache size or timeout? Don't come
up with ext4 being kernel vs. userspace fs. It was their decision to make it
userspace, so don't blame me. As a fs with networking it has to take the
comparison with nfs - as most interested users come from nfs.

Ever heard of fsc (FS-Cache),

To my knowledge there is no persistent (disk-based) caching in glusterfs at
all ...

Post by Tomasz Chmielewski
acreg*, acdir*, actimeo options for NFS?

... as well as options only dealing with caching of file/dir attributes.
You are talking about completely different things here. If you want to argue
about that you should probably _request_ these types of options additionally
to the already existing ones.

Post by Tomasz Chmielewski
Yes, they are related to cache, and oh, NFS is kernelspace. And yes,
there are tunable timeout options for NFS as well.

The only reasonable configurable timeout in nfs is the rpc timeout.

Post by Tomasz Chmielewski
As of timeout options with ext4, or any other local filesystem - if you
ever used iSCSI, you would also discover that it's recommended to set
reasonable timeout options there as well, depending on your network
infrastructure and usage/maintenance patterns. "Incidentally", iSCSI is
also kernelspace.

And is it "incidentally" as slow as glusterfs in the same environment? Not?
And did you ever manage to hard freeze your boxes with it? To show double
files? Not being able to open existing files? Wrong filedates? Wrong UIDs/GIDs?
Shall I continue to name problems we saw through all tested versions of
glusterfs? I don't because I dropped the idea that it would be helpful at all.
If you want to share helpful information tell us how you would
default-configure glusterfs so it is equally performing to nfs in most cases.
If you can't, what is your point then?

Post by Tomasz Chmielewski
--
Tomasz Chmielewski
http://wpkg.org

--
Regards,
Stephan

Tomasz Chmielewski

2011-05-21 11:27:38 UTC

Post by Stephan von Krawczynski
On Fri, 20 May 2011 17:01:22 +0200

Post by Tomasz Chmielewski

Post by Stephan von Krawczynski
most of them are just an outcome of not being able to find a working i.e. best
solution for a problem. cache-timeout? thread-count? quick-read?
stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras
are bogus. When did you last tune the ext4 cache size or timeout? Don't come
up with ext4 being kernel vs. userspace fs. It was their decision to make it
userspace, so don't blame me. As a fs with networking it has to take the
comparison with nfs - as most interested users come from nfs.

Ever heard of fsc (FS-Cache),

To my knowledge there is no persistent (disk-based) caching in glusterfs at
all ...

Correct, but just a while ago you questioned the idea of caching in the
filesystems ("When did you last tune the ext4 cache size or timeout").

It's amazing that you changed your mind so fast.

[cutting your ignorance about iSCSI and pretending you know it well]

I don't think there can be any constructive discussion with you, sorry.

If you found a bug, and even more, it's repeatable for you, please file
a bug report and describe the way to reproduce it.
Since the problems happen so often for you, I'm sure it shouldn't be so
hard to produce a good test case.

Initiating flame discussions is not really a good development model.

--
Tomasz Chmielewski
http://wpkg.org

Stephan von Krawczynski

2011-05-22 09:06:31 UTC

On Sat, 21 May 2011 13:27:38 +0200

Post by Tomasz Chmielewski
If you found a bug, and even more, it's repeatable for you, please file
a bug report and describe the way to reproduce it.

Ha, very sorry that the project is not an easy-go for a dev. Creating
reproducable setups for software spreading over 3 or more boxes is a pretty
complex thing to do. And even if something is reproducable on my side that
does not mean it is with _other_ hardware and the same setup on the devs' side.
Drop the idea this can be debugged with the same strategy you debug "hello
world". I stopped to look at the bugs long ago because the software does not
give you a chance to even find out when a problem started. If you want to see
something where you can find out yourself about what is going on look at
netfilter. There you have tables and output in /proc about ongoing nats and
open connections (connection-tracker).
In glusterfs you have exactly nothing, and if you stop the replication setup
at some point you need to ls terabytes of data to find the not-synced files.
This is complete nonsense and not worth looking at it.

If you need input, how about reading udo?

"I already mentioned the bugs that seem to describe the same problems. I
really do not think that creating new ones describing the same problems
would help. Maybe the old ones should be reopened. These bugs mentioned in:
http://gluster.org/pipermail/gluster-users/2011-May/007619.html are
basically the same.

Currently I really do not know how to describe/analyze the problem further.
"

?

Post by Tomasz Chmielewski
Initiating flame discussions is not really a good development model.

I did not start the topic, but I can well imagine the feelings of the first
poster. I was in the same situation more than a year ago and had to find out
that nobody cares to improve the fundamental strategy. And that people still
find out the same - months later - is the real bad news.
I have no doubts that we read the same topics with new version number in a
year.

Post by Tomasz Chmielewski
--
Tomasz Chmielewski
http://wpkg.org

--
Regards,
Stephan

Burnash, James

2011-05-18 15:09:42 UTC

Based on my experiences so far, I would absolutely agree with you.

I know new releases are hard to produce at 100% coming out of the gate, so the fact that 3.2 is not all that robust is unsurprising to me. Hopefully the point releases improve this.

I would really like to see the known bugs in 3.1.4 fixed in a later point release in that branch.

I do agree slightly with Stephan (which is unusual) that features over stability seems to be the current direction for the project. The shame of that is that stability has got to be the #1 priority for any features to be useful. That said, 3.1.3 does seem pretty solid to me now, compared with 3.1.1 and with 3.0.4.

I disagree with Stephan about everything after 2.0.9 being "bogus". Just because the development direction does not correspond with a single individuals needs doesn't mean it's worthless or "totally broken" for others. The additional features for non-downtime changes to the storage nodes are very useful to us here - though more robust behavior and better documentation would be welcome.

Just my two cents

-----Original Message-----
From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of Whit Blauvelt
Sent: Wednesday, May 18, 2011 9:55 AM
To: Udo Waechter
Cc: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Jeff Darcy

2011-05-18 17:04:29 UTC

Post by Burnash, James
Based on my experiences so far, I would absolutely agree with you.
I know new releases are hard to produce at 100% coming out of the
gate, so the fact that 3.2 is not all that robust is unsurprising to
me. Hopefully the point releases improve this.
I would really like to see the known bugs in 3.1.4 fixed in a later
point release in that branch.
I do agree slightly with Stephan (which is unusual) that features
over stability seems to be the current direction for the project. The
shame of that is that stability has got to be the #1 priority for any
features to be useful. That said, 3.1.3 does seem pretty solid to me
now, compared with 3.1.1 and with 3.0.4.
I disagree with Stephan about everything after 2.0.9 being "bogus".
Just because the development direction does not correspond with a
single individuals needs doesn't mean it's worthless or "totally
broken" for others. The additional features for non-downtime changes
to the storage nodes are very useful to us here - though more robust
behavior and better documentation would be welcome.

As the leader for a project based on GlusterFS, I'm also very sensitive
to the stability issue. It is a bit disappointing when every major
release seems to be marked by significant regressions in existing
functionality. It's particular worrying when even community leaders and
distro packagers report serious problems and those take a long time to
resolve. I'd put you in that category, James, along with JoeJulian and
daMaestro just with respect to the 3.1.4 and 3.2 releases. Free or paid,
that's not a nice thing to do to your marquee users, and you're the kind
of people whose interest and support we can hardly afford to lose. Even
I've been forced to take a second look at alternatives, and I'm just
about the biggest booster Gluster has who's not on their payroll.

So how do we deal with these issues *constructively*? Not by
characterizing every release since 2.0.9 as "bogus" that's for sure.
Sorry, Stephan, but that's absurd. The 3.1+ versions are not only more
manageable and add the non-disruptive configuration changes that James
mentions (*very* hard to do BTW), but there have also been significant
fixes to just about every area of the code. I see 284 high-severity bugs
whose resolution went to FIXED in the past year. Granted, some of those
were *introduced* in the past year, but the majority can easily be seen
to have existed in 2.0.9 or its predecessors and there are many more
that wouldn't have shown up on that search due to inconsistent use of
the bug system (I'll get to that in a moment). That's a lot of bugs that
definitely did affect someone, even if it wasn't you, especially since
storage and distributed systems are not the easiest programming domains
to work in. Watch the bug list or the patch stream closely, as I do, and
you'll see a constant stream of fixes to bugs that have clearly been
latent since 2.0.9 or earlier.

The problem I do see, and I do agree with others who've spoken out here,
is primarily one of communication. It's a bit frustrating to see dozens
of geosync/marker/quota patches fly by while a report of a serious bug
isn't even *assigned* (let alone worked on as far as anyone can tell)
for days or even weeks. I can only imagine how it must be for the people
whose filesystems have been totally down for that long, whose bosses are
breathing down their necks and pointedly suggesting that a technology
switch might be in order. We can all help by making sure our bugs are
actually filed on bugs.gluster.com - not just mentioned here or on IRC -
and by doing our part to provide the developers with the information
they need to reproduce or fix problems. We can help by actually testing
pre-release versions, particularly if our configurations/workloads are
likely to represent known gaps in Gluster's own test coverage. The devs
can help by marking bugs' status/assignment, severity/priority, and
found/fixed versions more consistently. The regression patterns in the
last few releases clearly indicate that more tests are needed in certain
areas such as RDMA and upgrades with existing data.

The key here is that if we want things to change we all need to make it
happen. We can't tell Gluster how to run their business, which includes
how they decide on features or how they allocate resources to new
features vs. bug fixes, but as a community we can give them clear and
unambiguous information about what is holding back more widespread
adoption. It used to be manageability; now it's bugs. We need to be as
specific as we possibly can about which bugs or shortcomings matter to
us, not just vague "it doesn't work" or "it's slow" or "it's not POSIX
enough" kinds of stuff, so that a concrete plan can be made to improve
the situation.

Joe Landman

2011-05-18 17:13:39 UTC

[...]

Post by Jeff Darcy
As the leader for a project based on GlusterFS, I'm also very sensitive
to the stability issue. It is a bit disappointing when every major
release seems to be marked by significant regressions in existing
functionality. It's particular worrying when even community leaders and
distro packagers report serious problems and those take a long time to
resolve. I'd put you in that category, James, along with JoeJulian and
daMaestro just with respect to the 3.1.4 and 3.2 releases. Free or paid,
that's not a nice thing to do to your marquee users, and you're the kind
of people whose interest and support we can hardly afford to lose. Even
I've been forced to take a second look at alternatives, and I'm just
about the biggest booster Gluster has who's not on their payroll.

Us as well ... we have a product that uses it as its base, so we are
obviously strong proponents of it.

Post by Jeff Darcy
So how do we deal with these issues *constructively*? Not by
characterizing every release since 2.0.9 as "bogus" that's for sure.

Agreed. Lets not get on this sort of track. I expect issues with early
revs, and I expect things to improve with each rev. When we find bugs
we do our best to submit them to bugs.gluster.com. I'd suggest everyone
get an account there, and submit your bugs. Especially if you have a
replicator.

[...]

Post by Jeff Darcy
The problem I do see, and I do agree with others who've spoken out here,
is primarily one of communication. It's a bit frustrating to see dozens
of geosync/marker/quota patches fly by while a report of a serious bug
isn't even *assigned* (let alone worked on as far as anyone can tell)
for days or even weeks. I can only imagine how it must be for the people
whose filesystems have been totally down for that long, whose bosses are
breathing down their necks and pointedly suggesting that a technology
switch might be in order. We can all help by making sure our bugs are
actually filed on bugs.gluster.com - not just mentioned here or on IRC -

+1 Folks, get an account there, and report problems, even if you
haven't paid for support.

Second, if you haven't paid for support, and you are using it in a
production environment to either make money or support your mission,
please, help Gluster there as well. They aren't doing this project for
their own health, they need to show a demand for this in market from
paying customers (just like Redhat, and every other company).

Post by Jeff Darcy
and by doing our part to provide the developers with the information
they need to reproduce or fix problems. We can help by actually testing
pre-release versions, particularly if our configurations/workloads are
likely to represent known gaps in Gluster's own test coverage. The devs
can help by marking bugs' status/assignment, severity/priority, and
found/fixed versions more consistently. The regression patterns in the
last few releases clearly indicate that more tests are needed in certain
areas such as RDMA and upgrades with existing data.
The key here is that if we want things to change we all need to make it
happen. We can't tell Gluster how to run their business, which includes
how they decide on features or how they allocate resources to new
features vs. bug fixes, but as a community we can give them clear and
unambiguous information about what is holding back more widespread
adoption. It used to be manageability; now it's bugs. We need to be as
specific as we possibly can about which bugs or shortcomings matter to
us, not just vague "it doesn't work" or "it's slow" or "it's not POSIX
enough" kinds of stuff, so that a concrete plan can be made to improve
the situation.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: ***@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615

Udo Waechter

2011-05-18 19:27:56 UTC

+1 Folks, get an account there, and report problems, even if you haven't paid for support.
Second, if you haven't paid for support, and you are using it in a production environment to either make money or support your mission, please, help Gluster there as well. They aren't doing this project for their own health, they need to show a demand for this in market from paying customers (just like Redhat, and every other company).

Here, the question arises what the difference between the paid and non-paid version is? Is there one? Or do paying customers only get the possibility to call support? If so, what good would it be if basic functionality does not work. Would one get instant bug-fixes? Do paying customers get better documentation? Do they get a warning about versions that obviously have problems?
What I see from here: http://www.gluster.com/services/ nothing of these services would actually help if the software provided by the company is flaved like gluster seems to be.

We are using mostly opensource software. In my experience it usually makes no big difference whether one pays for a product or not. On the contrary. The commercial software that we use gives us the feeling that we could call someone and have our problems solved quickly. The experience is the contrary. Usually one is treated like someone who does not know a thing and problems or bugs usually do not get solved quicker. Sometimes it gives us headaches that we do not have someone to blame for a bug when these turn up in Opensource software. Most projects that do care such grave bugs are solved quickly.
Just our experience...
--udo.

--
:: udo waechter - ***@zoide.net :: N 52º16'30.5" E 8º3'10.1"
:: genuine input for your ears: http://auriculabovinari.de
:: your eyes: http://ezag.zoide.net
:: your brain: http://zoide.net

Joe Landman

2011-05-18 19:41:38 UTC

Post by Udo Waechter

Post by Joe Landman
+1 Folks, get an account there, and report problems, even if you
haven't paid for support.
Second, if you haven't paid for support, and you are using it in a
production environment to either make money or support your
mission, please, help Gluster there as well. They aren't doing
this project for their own health, they need to show a demand for
this in market from paying customers (just like Redhat, and every
other company).

Here, the question arises what the difference between the paid and
non-paid version is? Is there one? Or do paying customers only get

Yes.

Post by Udo Waechter
the possibility to call support? If so, what good would it be if

Their issues get priority.

Post by Udo Waechter
basic functionality does not work. Would one get instant bug-fixes?

There is no such thing as an instant bug fix, as I am sure you are aware.

Post by Udo Waechter
Do paying customers get better documentation? Do they get a warning
http://www.gluster.com/services/ nothing of these services would
actually help if the software provided by the company is flaved like
gluster seems to be.
We are using mostly opensource software. In my experience it usually
makes no big difference whether one pays for a product or not. On the
contrary. The commercial software that we use gives us the feeling
that we could call someone and have our problems solved quickly. The
experience is the contrary. Usually one is treated like someone who
does not know a thing and problems or bugs usually do not get solved
quicker. Sometimes it gives us headaches that we do not have someone
to blame for a bug when these turn up in Opensource software. Most
projects that do care such grave bugs are solved quickly. Just our
experience... --udo.

I won't respond to this here, this is for Gluster Inc. to respond to.

I am trying to get someone from the company to hopefully spend a bit
more time talking about the issues.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: ***@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615

Burnash, James

2011-05-19 14:24:54 UTC

Jeff,

I certainly agree with your characterization of the issues with Gluster, and how they are handled by the company. The recent thread from
Anand was reassuring, and I'm optimistic that they will be able to execute as promised.

I really like this product, and I can only imagine the unbelievable complexity of developing this product, and trying to test all of the regressions and as many edge cases as can be discovered from the mailing list as well as brainstorming.

I have other questions, but I'll start a new thread since they are not really pertinent to this thread.

Oh, and BTW - I'm flattered to be characterized as a "community leader". It's much better than my self applied moniker "almost certainly clueless tinkerer" :-)

James
-----Original Message-----
From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of Jeff Darcy
Sent: Wednesday, May 18, 2011 1:04 PM
To: gluster-***@gluster.org
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Post by Burnash, James
Based on my experiences so far, I would absolutely agree with you.
I know new releases are hard to produce at 100% coming out of the
gate, so the fact that 3.2 is not all that robust is unsurprising to
me. Hopefully the point releases improve this.
I would really like to see the known bugs in 3.1.4 fixed in a later
point release in that branch.
I do agree slightly with Stephan (which is unusual) that features over
stability seems to be the current direction for the project. The shame
of that is that stability has got to be the #1 priority for any
features to be useful. That said, 3.1.3 does seem pretty solid to me
now, compared with 3.1.1 and with 3.0.4.
I disagree with Stephan about everything after 2.0.9 being "bogus".
Just because the development direction does not correspond with a
single individuals needs doesn't mean it's worthless or "totally
broken" for others. The additional features for non-downtime changes
to the storage nodes are very useful to us here - though more robust
behavior and better documentation would be welcome.

As the leader for a project based on GlusterFS, I'm also very sensitive to the stability issue. It is a bit disappointing when every major release seems to be marked by significant regressions in existing functionality. It's particular worrying when even community leaders and distro packagers report serious problems and those take a long time to resolve. I'd put you in that category, James, along with JoeJulian and daMaestro just with respect to the 3.1.4 and 3.2 releases. Free or paid, that's not a nice thing to do to your marquee users, and you're the kind of people whose interest and support we can hardly afford to lose. Even I've been forced to take a second look at alternatives, and I'm just about the biggest booster Gluster has who's not on their payroll.

So how do we deal with these issues *constructively*? Not by characterizing every release since 2.0.9 as "bogus" that's for sure.
Sorry, Stephan, but that's absurd. The 3.1+ versions are not only more manageable and add the non-disruptive configuration changes that James mentions (*very* hard to do BTW), but there have also been significant fixes to just about every area of the code. I see 284 high-severity bugs whose resolution went to FIXED in the past year. Granted, some of those were *introduced* in the past year, but the majority can easily be seen to have existed in 2.0.9 or its predecessors and there are many more that wouldn't have shown up on that search due to inconsistent use of the bug system (I'll get to that in a moment). That's a lot of bugs that definitely did affect someone, even if it wasn't you, especially since storage and distributed systems are not the easiest programming domains to work in. Watch the bug list or the patch stream closely, as I do, and you'll see a constant stream of fixes to bugs that have clearly been latent since 2.0.9 or earlier.

The problem I do see, and I do agree with others who've spoken out here, is primarily one of communication. It's a bit frustrating to see dozens of geosync/marker/quota patches fly by while a report of a serious bug isn't even *assigned* (let alone worked on as far as anyone can tell) for days or even weeks. I can only imagine how it must be for the people whose filesystems have been totally down for that long, whose bosses are breathing down their necks and pointedly suggesting that a technology switch might be in order. We can all help by making sure our bugs are actually filed on bugs.gluster.com - not just mentioned here or on IRC - and by doing our part to provide the developers with the information they need to reproduce or fix problems. We can help by actually testing pre-release versions, particularly if our configurations/workloads are likely to represent known gaps in Gluster's own test coverage. The devs can help by marking bugs' status/assignment, severity/priority, and found/fixed versions more consistently. The regression patterns in the last few releases clearly indicate that more tests are needed in certain areas such as RDMA and upgrades with existing data.

The key here is that if we want things to change we all need to make it happen. We can't tell Gluster how to run their business, which includes how they decide on features or how they allocate resources to new features vs. bug fixes, but as a community we can give them clear and unambiguous information about what is holding back more widespread adoption. It used to be manageability; now it's bugs. We need to be as specific as we possibly can about which bugs or shortcomings matter to us, not just vague "it doesn't work" or "it's slow" or "it's not POSIX enough" kinds of stuff, so that a concrete plan can be made to improve the situation.
_______________________________________________
Gluster-users mailing list
Gluster-***@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com

Udo Waechter

2011-05-18 16:58:26 UTC

Hi, and thanks for the answers.

From reading this list, I wonder if this would be an accurate summary of the
3.1.3 - most dependable current version
3.1.4 - gained a few bugs
3.2.0 - not stable
So 3.1.3 would be suitable for production systems, as long as the known bug
in mishandling Posix group permissions is worked around (by loosening
permissions).

loosening permissions is not really an option for us. we have many projects/work groups some of them have confidential (although encrypted) data. Most of the groups are pretty dynamic and access permissions are the only way to somehow provide at least a base-level of security.

Now, I'm not personally knowledgeable on any of this aside from the Posix
group problem. Just asking for confirmation or not of the basic sense I'm
getting from those with extensive experience that 3.1.3 is essentially
dependable, while 3.1.4 is problematic, and 3.2.0 should perhaps only be
used if you want to gain familiarity with the new geo-replication feature,
but avoided for current production use.

Yes. The trouble is that there is no real warning about these problems.
I would say that 3.2 should not be used at all. What good are new features if the basic features do not work.
If I read http://www.gluster.com/community/documentation/index.php/GlusterFS_General_FAQ#What_file_system_semantics_does_GlusterFS_Support.3B_is_it_fully_POSIX_compliant.3F

and it states that it is fully POSIX compatible, I am tempted to believe that. And I truly believed that. It took me about half a year to choose our next generation filesystem (moving away from multiple nfs-servers). POSIX compatibility was one of the top features required.
If the documentation states that something is the case, but then it turns out that the contrary is the case I am not sure if I can trust the rest of the project. Company or not.
If the documentation of a project states things that simply are not true, then there is not reason to use the software.

I am really disappointed. Gluster seemed like a really nice project. As it turns out its mainly bogus.
Now we need to go back to the drawing board and try to find an alternative.

Have a nice day,
udo.

--
:: udo waechter - ***@zoide.net :: N 52º16'30.5" E 8º3'10.1"
:: genuine input for your ears: http://auriculabovinari.de
:: your eyes: http://ezag.zoide.net
:: your brain: http://zoide.net

Anand Avati

2011-05-18 17:34:43 UTC

3.1.4 - gained a few bugs

Can someone throw more light on this? We do not have any open bugs in
bugzilla marked against 3.1.4 - which means either

a) They were reported and fixed, but we haven't made a release yet

b) We have not been fixing it because we have not yet heard about it!

Avati

Whit Blauvelt

2011-05-18 18:34:37 UTC

3.1.4 - gained a few bugs
Can someone throw more light on this? We do not have any open bugs in
bugzilla marked against 3.1.4 - which means either
a) They were reported and fixed, but we haven't made a release yet
b) We have not been fixing it because we have not yet heard about it!
Avati

I was summarizing statements on this list recently, to the effect that 3.1.4
was less dependable in production than 3.1.3. That can be true, and still be
consistent with those bugs having been reported and fixed, but you "haven't
made a release yet."

Anand Avati

2011-05-18 17:14:17 UTC

Udo,
Do you know what kind of access was performed on those files? Were they
just copied in (via cp), were they rsync'ed over an existing set of data?
Was it data carried over from 3.1 into a 3.2 system? We hate to lose users
(community users or paid customers equally) and will do our best to keep you
happy. Please file a bug report with as much history as possible and we will
have it assigned on priority.

Thanks,
Avati

On Wed, May 18, 2011 at 5:45 AM, Udo Waechter <

Post by Udo Waechter
Hi there,
after reporting some trouble with group access permissions,
http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which
still persist, btw.)
things get worse and worse with each day.
Now, we see a lot of duplicate files (again, only fuse-clients here),
access permissions are reset on a random and totally annoying basis. Files
-rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x 1 user1 group2 594 2011-02-04 18:43 preprocessing128.m
-rwxrws--x 1 user2 group2 531 2011-03-03 10:47 result_11.mat
------S--T 1 root group2 0 2011-04-14 07:57 result_11.mat
-rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt
-rwxrws--x 1 user1 group2 11069 2010-12-02 14:53 trigger.odt
where group2 are secondary groups.
How come that there are these empty and duplicate files? Again, this
listing is from the fuse-mount
Could it be that version 3.2.0 is totally borked?

Udo Waechter

2011-05-18 19:09:59 UTC

Hi,

Udo,
Do you know what kind of access was performed on those files? Were they just copied in (via cp), were they rsync'ed over an existing set of data? Was it data carried over from 3.1 into a 3.2 system? We hate to lose

We started our first experiments (and quickly after gone into production) with our cloud infrastructure.
There, the use case is pretty easy:
one user and a shared (distributed) volume for >30 VMs. This worked pretty well pretty quick and we started using it in production. For months everything was great and still is.

The second use case was then to create a distributed+replicated volume for the user data.
No problem so far.
* then we rsynced the data from our old nfs-servers to the new volume
* We moved files and directories around within the volume
* We changed access permissions on those files. Basically going to 0700 for user's personal directories, and to 0750 for working group-directories. The latter also are "g+s" to retain groups over edits/removes/creates of files/directories. These directories are giving us the problems described. There are duplicate files, vanishing files and such. All of these described in my previous posts. All the groups are secondary groups of the users.

Last week we got productive with this volume and as soon as people started working all these problems turned up. The bad part is, that they become worse with every day.

The most recent development is, that even in those directories that are owned by a single user and its primary group these problems turn up. Its so bad, that those people even can't create new files.

users (community users or paid customers equally) and will do our best to keep you happy. Please file a bug report with as much history as possible and we will have it assigned on priority.

I already mentioned the bugs that seem to describe the same problems. I really do not think that creating new ones describing the same problems would help. Maybe the old ones should be reopened. These bugs mentioned in: http://gluster.org/pipermail/gluster-users/2011-May/007619.html are basically the same.

Currently I really do not know how to describe/analyze the problem further.
Unfortunatly I need to come up with a solution ASAP. Our institute is a pretty busy one, currently there are a lot of experiments going on. Returning to the old servers is not really an option. We do not have the free space to accomodate all the data.

--udo.

--
---[ Institute of Cognitive Science @ University of Osnabrueck
---[ Albrechtstrasse 28, D-49076 Osnabrueck, 969-3362
---[ Documentation: https://doc.ikw.uni-osnabrueck.de

Andre Felipe Machado

2011-05-19 14:35:14 UTC

Hello,
It seems that QA test cases are somewhat different than real world deployments
by the users at this list.

It may be more productive to have wiki pages where we, the users, describe with
very much enough details (config files, hardware and network, workload profiles,
etc) how we are using glusterfs.

Then, the QA Team will have a good starting point to write their performance
scripts and regression tests, for example.
Likely, some users will help running these scripts and regression tests over
their hardware and network, collecting logs helping to improve stability and
reliability.

How would these wiki pages be organized?
Could the QA Team create a skeleton of wiki pages tree to organize how we will
write our deployment info and workload profiles in a format useful for them?

The stability and reliability issues are the blocking factors for a government
size deployment we are evaluating. Performance for small files (1k to 20k) are
"the second level" issue. But this affects alternatives too.

Regards.
Andre Felipe
http://www.techforce.com.br

Burnash, James

2011-05-19 14:50:30 UTC

Excellent point, and fabulous constructive idea. I'll certainly participate.

-----Original Message-----
From: gluster-users-***@gluster.org [mailto:gluster-users-***@gluster.org] On Behalf Of Andre Felipe Machado
Sent: Thursday, May 19, 2011 10:35 AM
To: gluster-***@gluster.org
Subject: Re: [Gluster-users] Improving test cases QA / gluster 3.2.0 - totally broken?

Hello,
It seems that QA test cases are somewhat different than real world deployments by the users at this list.

It may be more productive to have wiki pages where we, the users, describe with very much enough details (config files, hardware and network, workload profiles,
etc) how we are using glusterfs.

Then, the QA Team will have a good starting point to write their performance scripts and regression tests, for example.
Likely, some users will help running these scripts and regression tests over their hardware and network, collecting logs helping to improve stability and reliability.

How would these wiki pages be organized?
Could the QA Team create a skeleton of wiki pages tree to organize how we will write our deployment info and workload profiles in a format useful for them?

The stability and reliability issues are the blocking factors for a government size deployment we are evaluating. Performance for small files (1k to 20k) are "the second level" issue. But this affects alternatives too.

Regards.
Andre Felipe
http://www.techforce.com.br

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com

Jérémie Tarot

2011-05-19 15:13:52 UTC

Hi,

Post by Andre Felipe Machado
Hello,
It seems that QA test cases are somewhat different than real world deployments
by the users at this list.
It may be more productive to have wiki pages where we, the users, describe with
very much enough details (config files, hardware and network, workload profiles,
etc) how we are using glusterfs.
Then, the QA Team will have a good starting point to write their performance
scripts and regression tests, for example.
Likely, some users will help running these scripts and regression tests over
their hardware and network, collecting logs helping to improve stability and
reliability.
How would these wiki pages be organized?
Could the QA Team create a skeleton of wiki pages tree to organize how we will
write our deployment info and workload profiles in a format useful for them?

May be this can be inspiring and helpful:

http://testcases.qa.ubuntu.com/

Bests
Jeremie

Dan Bretherton

2011-05-19 17:19:56 UTC

Message: 2 Date: Wed, 18 May 2011 19:00:30 +0200 From: Udo Waechter
Content-Type: text/plain; charset="windows-1252" On 18.05.2011, at

I?m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder if I?d have any ill-effects on the volume with a simple rpm downgrade and daemon restart.

I read somewhere in the docs that you need to reset the volume option beforehand
gluster volume reset<volname>
good luck. Would be nice to hear if it worked for you.
--udo.
http://ezag.zoide.net :: your brain: http://zoide.net --------------
smime.p7s Type: application/pkcs7-signature Size: 2427 bytes Desc: not
<http://gluster.org/pipermail/gluster-users/attachments/20110518/fec670c4/attachment-0001.bin>

Hello All- A few words of warning about downgrading, after what happened
to me when I tried it.

I downgraded from 3.2 to 3.1.4, but I am back on 3.2 again now because
the downgrade broke the rebalancing feature. I thought this might have
been due to version 3.2 having done something to the xattrs. I tried
downgrading to 3.1.3 and 3.1.2 as well, but rebalance was also not
working in those versions, having worked successfully in the past.

I found that the downgrade didn't go as smoothly as the upgrades usually
do. After downgrading the RPMs on the servers and restarting glusterd,
I couldn't mount the volumes, and the client logs were flooded with
errors like these for each server.

[2011-05-03 18:05:26.563591] E
[client-handshake.c:1101:client_query_portmap_cbk] 0-atmos-client-1:
failed to get the port number for remote subvolume
[2011-05-03 18:05:26.564543] I [client.c:1601:client_rpc_notify]
0-atmos-client-1: disconnected

I didn't need to reset the volumes after downgrading because none of the
volume files had been created or reset under version 3.2. Despite that
I did try doing "gluster volume reset <volname>" for all the volumes,
but it didn't stop the client log errors or solve the mounting problems.

I desperation I unmounted all the volumes from the clients and shut down
all the gluster related processes on all the servers. After waiting a
few minutes for any locked ports to clear (in case locked ports had been
causing the problems after the RPM downgrades) I restarted glusterd on
the servers, and then a few minutes later I was able to mount the
volumes again. I discovered that I could no longer rebalance
(fix-layout or migrate-data) a few days later.

To answer an earlier question, I am using 3.2 in a production
environment, although in the light of recent discussions in this thread
I wish I wasn't. Having said that, my users haven't reported any
problems nearly a week after the upgrade, so I am hoping that we won't
be affected by any of the issues that have been causing problems at
other sites.

-Dan.

Anthony J. Biacco

2011-05-19 17:30:03 UTC

My downgrade to 3.1.4 went ok, I did do the volume reset from the start.
Like you said, not as easy as an upgrade, but I wasn't expecting it to
be.
The key for me was stopping the daemon on the primary server, removing
the peer files, restarting the daemon. Then shut down the daemon on the
secondary servers, remove all the glusterd config files, restart the
daemon, then do a peer probe from the primary for all the secondaries (I
had only one).

-Tony
---------------------------
Manager, IT Operations
Format Dynamics, Inc.
P: 303-228-7327
F: 303-228-7305
***@formatdynamics.com
http://www.formatdynamics.com

Post by Burnash, James
-----Original Message-----
Sent: Thursday, May 19, 2011 11:20 AM
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Message: 2 Date: Wed, 18 May 2011 19:00:30 +0200 From: Udo Waechter
3.2.0 - totally broken? To: Gluster Users

osnabrueck.de>

Content-Type: text/plain; charset="windows-1252" On 18.05.2011, at

I?m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder

if I?d

Post by Burnash, James
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I read somewhere in the docs that you need to reset the volume

option

Post by Burnash, James
beforehand

gluster volume reset<volname>
good luck. Would be nice to hear if it worked for you.
--udo.
http://ezag.zoide.net :: your brain: http://zoide.net --------------
smime.p7s Type: application/pkcs7-signature Size: 2427 bytes Desc: not
<http://gluster.org/pipermail/gluster-

users/attachments/20110518/fec670c4/attachment-0001.bin>
Hello All- A few words of warning about downgrading, after what happened
to me when I tried it.
I downgraded from 3.2 to 3.1.4, but I am back on 3.2 again now because
the downgrade broke the rebalancing feature. I thought this might have
been due to version 3.2 having done something to the xattrs. I tried
downgrading to 3.1.3 and 3.1.2 as well, but rebalance was also not
working in those versions, having worked successfully in the past.
I found that the downgrade didn't go as smoothly as the upgrades usually
do. After downgrading the RPMs on the servers and restarting

glusterd,

Post by Burnash, James
I couldn't mount the volumes, and the client logs were flooded with
errors like these for each server.
[2011-05-03 18:05:26.563591] E
failed to get the port number for remote subvolume
[2011-05-03 18:05:26.564543] I [client.c:1601:client_rpc_notify]
0-atmos-client-1: disconnected
I didn't need to reset the volumes after downgrading because none of the
volume files had been created or reset under version 3.2. Despite that
I did try doing "gluster volume reset <volname>" for all the volumes,
but it didn't stop the client log errors or solve the mounting

problems.

Post by Burnash, James
I desperation I unmounted all the volumes from the clients and shut down
all the gluster related processes on all the servers. After waiting a
few minutes for any locked ports to clear (in case locked ports had been
causing the problems after the RPM downgrades) I restarted glusterd on
the servers, and then a few minutes later I was able to mount the
volumes again. I discovered that I could no longer rebalance
(fix-layout or migrate-data) a few days later.
To answer an earlier question, I am using 3.2 in a production
environment, although in the light of recent discussions in this thread
I wish I wasn't. Having said that, my users haven't reported any
problems nearly a week after the upgrade, so I am hoping that we won't
be affected by any of the issues that have been causing problems at
other sites.
-Dan.
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Mohit Anchlia

2011-05-19 17:54:49 UTC

I also requests users facing issues to open bugs if they think it is a
bug. This will help in keeping track of the bugs so that it doesn't
really go unnoticed atleast. It will also help others when they face
similar issues.

On Thu, May 19, 2011 at 10:30 AM, Anthony J. Biacco

Post by Anthony J. Biacco
My downgrade to 3.1.4 went ok, I did do the volume reset from the start.
Like you said, not as easy as an upgrade, but I wasn't expecting it to
be.
The key for me was stopping the daemon on the primary server, removing
the peer files, restarting the daemon. Then shut down the daemon on the
secondary servers, remove all the glusterd config files, restart the
daemon, then do a peer probe from the primary for all the secondaries (I
had only one).
-Tony
---------------------------
Manager, IT Operations
Format Dynamics, Inc.
P: 303-228-7327
F: 303-228-7305
http://www.formatdynamics.com

Post by Burnash, James
-----Original Message-----
Sent: Thursday, May 19, 2011 11:20 AM
Subject: Re: [Gluster-users] gluster 3.2.0 - totally broken?

Message: 2 Date: Wed, 18 May 2011 19:00:30 +0200 From: Udo Waechter

gluster

Post by Burnash, James

3.2.0 - totally broken? To: Gluster Users

osnabrueck.de>

Content-Type: text/plain; charset="windows-1252" On 18.05.2011, at

I?m actually thinking of downgrading to 3.1.3 from 3.2.0. Wonder

if I?d

Post by Burnash, James
have any ill-effects on the volume with a simple rpm downgrade and
daemon restart.

I read somewhere in the docs that you need to reset the volume

option

Post by Burnash, James
beforehand

gluster volume reset<volname>
good luck. Would be nice to hear if it worked for you.
--udo.
genuine input for your ears: http://auriculabovinari.de :: your
http://ezag.zoide.net :: your brain: http://zoide.net --------------

not

Post by Burnash, James

<http://gluster.org/pipermail/gluster-

users/attachments/20110518/fec670c4/attachment-0001.bin>
Hello All- A few words of warning about downgrading, after what

happened

Post by Burnash, James
to me when I tried it.
I downgraded from 3.2 to 3.1.4, but I am back on 3.2 again now because
the downgrade broke the rebalancing feature. I thought this might

have

Post by Burnash, James
been due to version 3.2 having done something to the xattrs. I tried
downgrading to 3.1.3 and 3.1.2 as well, but rebalance was also not
working in those versions, having worked successfully in the past.
I found that the downgrade didn't go as smoothly as the upgrades

usually

Post by Burnash, James
do. After downgrading the RPMs on the servers and restarting

glusterd,

Post by Burnash, James
I couldn't mount the volumes, and the client logs were flooded with
errors like these for each server.
[2011-05-03 18:05:26.563591] E
failed to get the port number for remote subvolume
[2011-05-03 18:05:26.564543] I [client.c:1601:client_rpc_notify]
0-atmos-client-1: disconnected
I didn't need to reset the volumes after downgrading because none of

the

Post by Burnash, James
volume files had been created or reset under version 3.2. Despite

that

Post by Burnash, James
I did try doing "gluster volume reset <volname>" for all the volumes,
but it didn't stop the client log errors or solve the mounting

problems.

Post by Burnash, James
I desperation I unmounted all the volumes from the clients and shut

down

Post by Burnash, James
all the gluster related processes on all the servers. After waiting a
few minutes for any locked ports to clear (in case locked ports had

been

Post by Burnash, James
causing the problems after the RPM downgrades) I restarted glusterd on
the servers, and then a few minutes later I was able to mount the
volumes again. I discovered that I could no longer rebalance
(fix-layout or migrate-data) a few days later.
To answer an earlier question, I am using 3.2 in a production
environment, although in the light of recent discussions in this

thread

Post by Burnash, James
I wish I wasn't. Having said that, my users haven't reported any
problems nearly a week after the upgrade, so I am hoping that we won't
be affected by any of the issues that have been causing problems at
other sites.
-Dan.
_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Andre Felipe Machado

2011-05-19 18:38:25 UTC

Hello,
I created mockup page [0] that could serve as starting point for improvements.
Imagined what would be useful for the QA team.
What info QA team need?

You could reach them by
Gluster documentation > gluster 3.2 documentation > Gluster Use case deployments
descriptions
[1]

Please, improve the mockup, even the initial pages.

The ubuntu test cases seem to be the RESULT of the QA team efforts from our input.
Good luck.

Andre Felipe

[0]
http://gluster.com/community/documentation/index.php/Gluster_Replicated_use_case_1_mockup

[1]
http://gluster.com/community/documentation/index.php/Gluster_Use_case_deployments_descriptions

Andre Felipe Machado

2011-05-24 15:30:46 UTC

Hello,
Are the wiki pages tree

Gluster 3.2 documentation > Gluster Use case deployments descriptions > Gluster
Replicated use case 1 mockup [0]

adequate for the QA Team needs?
Could the pages be replicated to the gluster 3.1 documentation tree?
Do you have improvements to the pages?

Regards.
Andre Felipe Machado
http://www.techforce.com.br

[0]
http://gluster.com/community/documentation/index.php/Gluster_Replicated_use_case_1_mockup

44 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Udo Waechter 2011-05-18 12:45:19 UTC

Stephan von Krawczynski 2011-05-18 13:44:48 UTC

Whit Blauvelt 2011-05-18 13:54:44 UTC

paul simpson 2011-05-18 14:05:27 UTC

Anthony J. Biacco 2011-05-18 16:56:06 UTC

Udo Waechter 2011-05-18 17:00:30 UTC

Tomasz Chmielewski 2011-05-18 17:04:32 UTC

Justice London 2011-05-18 19:48:14 UTC

Tomasz Chmielewski 2011-05-18 20:12:49 UTC

Justice London 2011-05-18 19:49:12 UTC

Burnash, James 2011-05-18 19:57:28 UTC

Justice London 2011-05-18 20:08:24 UTC

Mohit Anchlia 2011-05-18 20:14:07 UTC

Anthony J. Biacco 2011-05-18 20:19:11 UTC

Tomasz Chmielewski 2011-05-18 20:09:57 UTC

Anand Babu Periasamy 2011-05-18 20:16:59 UTC

paul simpson 2011-05-18 23:23:04 UTC

Stephan von Krawczynski 2011-05-20 09:15:18 UTC

Jeff Darcy 2011-05-20 12:35:35 UTC

Stephan von Krawczynski 2011-05-20 13:51:45 UTC

Jeff Darcy 2011-05-20 14:15:36 UTC

Stephan von Krawczynski 2011-05-20 14:47:29 UTC

Tomasz Chmielewski 2011-05-20 15:01:22 UTC

Stephan von Krawczynski 2011-05-21 08:33:22 UTC

Tomasz Chmielewski 2011-05-21 11:27:38 UTC

Stephan von Krawczynski 2011-05-22 09:06:31 UTC

Burnash, James 2011-05-18 15:09:42 UTC

Jeff Darcy 2011-05-18 17:04:29 UTC

Joe Landman 2011-05-18 17:13:39 UTC

Udo Waechter 2011-05-18 19:27:56 UTC

Joe Landman 2011-05-18 19:41:38 UTC

Burnash, James 2011-05-19 14:24:54 UTC

Udo Waechter 2011-05-18 16:58:26 UTC

Anand Avati 2011-05-18 17:34:43 UTC

Whit Blauvelt 2011-05-18 18:34:37 UTC

Anand Avati 2011-05-18 17:14:17 UTC

Udo Waechter 2011-05-18 19:09:59 UTC

Andre Felipe Machado 2011-05-19 14:35:14 UTC

Burnash, James 2011-05-19 14:50:30 UTC

Jérémie Tarot 2011-05-19 15:13:52 UTC

Dan Bretherton 2011-05-19 17:19:56 UTC

Anthony J. Biacco 2011-05-19 17:30:03 UTC

Mohit Anchlia 2011-05-19 17:54:49 UTC

Andre Felipe Machado 2011-05-19 18:38:25 UTC

Andre Felipe Machado 2011-05-24 15:30:46 UTC

about - legalese

Loading...