Discussion:
[rsyslog] Enabling $ActionFileEnableSync caused massive increase in write volume (bytes/sec) to NAS - can someone help shed light?
Aaron Nichols
2009-10-20 15:32:07 UTC
Permalink
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.

First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.

Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.

Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.

This leaves me with a few questions:

1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.

2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.

3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.

Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.

Let me know if I can add info that would help shed light on this.

Thanks,
Aaron
Rainer Gerhards
2009-10-20 16:32:30 UTC
Permalink
Hi Aaron,

first and very important question: which version of rsyslog are you using?

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 5:32 PM
To: rsyslog-users
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase inwrite volume (bytes/sec) to NAS - can someone help shed
light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is
connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of
granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-20 16:54:02 UTC
Permalink
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Rainer Gerhards
2009-10-20 16:58:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 6:54 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease inwrite volume (bytes/sec) to NAS - can someone
helpshed light?
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you
using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.

Rainer
Aaron Nichols
2009-10-20 18:41:44 UTC
Permalink
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Here are the NFS mount options for the nodes writing to the NFS volume. The
NFS server is an EMC Celerra - I can get any specific parameters there are
questions about but can't easily provide a dump of the configured options:

hostname:/log/syslog /log/syslog nfs
rw,vers=3,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=hostname
0 0

I will work on a debug log but as these are production systems it'll take a
little time.

Thanks,
Aaron
Aaron Nichols
2009-10-20 20:37:43 UTC
Permalink
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?

If I have "$ActionFileEnableSync off" set - should I expect adding or
removing "-" before an action to have any impact at all? The documentation
just associates these two options with enabling/disabling "file sync" but it
seems like there are some additional subtleties that aren't clear.

Thanks,
Aaron
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards <
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Rainer Gerhards
2009-10-21 07:38:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 10:38 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincreaseinwrite volume (bytes/sec) to NAS - can someone helpshed
light?
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in

*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync

fileWithSync is being synced and fileWithoutSync is not!

This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...

Rainer
Aaron Nichols
2009-10-21 13:17:49 UTC
Permalink
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)

This led to my original suspicion that the two options somehow change sync
behavior independently. If they shouldn't, then I need to get the debug log
and do some analysis.

I appreciate the guidance - I'll gather some more details.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 13:22:29 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 3:18 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync
causedmassiveincreaseinwrite volume (bytes/sec) to NAS - cansomeone
helpshed light?
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The
presence
Post by Rainer Gerhards
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason
why *a
Post by Rainer Gerhards
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...

Rainer
Aaron Nichols
2009-10-21 14:25:21 UTC
Permalink
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
changes. To uber-simplify my original email:

"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops

I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration. Again,
I need to do some testing because at this point I'm just making guesses
without much information.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 15:32:59 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 4:25 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in
front
Post by Rainer Gerhards
of
all write statements? On some systems, write statements may be
included via
Post by Rainer Gerhards
files and if these are in front of the directive, those would be
unaffected
Post by Rainer Gerhards
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops
I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration.
I may be overlooking something here, but the check is very straightforward
and I don't see why some should be affected but others not. This is the file
(and spot) in question:

http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/omfile.c;h=bb12b4b61dacb
84167b9b0d17e49a597342dddc0;hb=07b076ddcfed4ea9b447d0be574d1dcdb799bc2f#l623

Do a search for "bEnableSync" to see all occurences.

But the debug log will show...

Rainer
Again,
I need to do some testing because at this point I'm just making guesses
without much information.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 06:21:01 UTC
Permalink
Aaron,

I am replying on-list so that others can follow (and I hope they have some
advise). I have stripped the sensitive part of your message. In short for all
others: I got a debug log and have now reviewed it.
During this debug run we saw NAS write volume increase from a "normal"
2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the
test. I do understand how enabling file sync will increase iops but
I'm unclear on why the volume of data being written would increase so
much. If rsyslog has file sync "disabled" then I would expect bursts
of data and if file sync is "enabled" I would expect the same amount
of data to be written in much more frequent, but also much smaller
chunks. NFS could be a culprit here, but I'm curious what NFS
parameters would impact this behavior.
The debug log does not show anything that looks specifically suspect. The
incoming traffic is intense and the default queue size is overrun quickly,
resulting in voluntary delays of the sender (via tcp). Looking at the
traffic, I think this is perfectly legal and increasing the queue size would
not bring any benefit (I think the end result would just be a larger memory
footprint without any advantage).

I do not see any unusual write behavior. Of course, the debug log could be
more verbose as it does not specifically tell about the syncs (usually, the
log is missing some information you'd like to see for a given instant, but
you can't include everything...). But looking at the code base plus the log
really does not reveal anything suspicious.

One thing you might want to try is doing an strace so that we can look at it
and see which exact OS calls are made and how long they last.
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.

So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
I have looked at the option
$OMFileFlushOnTXEnd and wonder if it would be a preferred method to
resolve this in the 4.1.1 version (or if it even works there).
This makes no sense for v4 (I think it is not even available).

Sorry I have no better answer. If get a strace, I'd gladly look at it.

Rainer
david
2009-10-23 06:32:05 UTC
Permalink
Post by Rainer Gerhards
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.
So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
this reminded me of something.

a standard complient NFS implementation would require that any write to a
NFS volume go all the way to the destination and be safe on disk before
the write call returns to the calling program.

this doesn't mean when you do a fsync, it means each individual write
call.

for the program to think the data has been written and it not be available
on the NFS drive to other systems there has to be some non-standard
caching in place here.

note that unless you have a high-end NFS server that includes some
battery-backed ram on it, standard complient behavior also means a very
low write transaction rate.

note that the caching could be on the reader machines instead of on the
writer, and it could be something like caching of the metadata (including
size and last modified timestamp)

David Lang
Alan T DeKok
2009-10-23 06:39:35 UTC
Permalink
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.

David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.

If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.

There is already a way to get syslog data from one system to another:
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.

Alan DeKok.
Rainer Gerhards
2009-10-23 07:03:05 UTC
Permalink
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.

Other than that, I agree to your statement.

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
david
2009-10-23 07:17:43 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?

if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.

David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:53:32 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Friday, October 23, 2009 9:18 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so
I do
Post by Rainer Gerhards
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?
No, he said NFS, at least in the last message (not sure about the beginning
of this story...)

Rainer
if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.
David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec)
to
Post by Rainer Gerhards
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above
prohibition
Post by Rainer Gerhards
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will
go
Post by Rainer Gerhards
to zero. This likely isn't what people want from a syslog server.
There is already a way to get syslog data from one system to
the syslog protocol. Using NFS as a replacement for syslog is wrong
on
Post by Rainer Gerhards
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:53:32 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Friday, October 23, 2009 9:18 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so
I do
Post by Rainer Gerhards
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?
No, he said NFS, at least in the last message (not sure about the beginning
of this story...)

Rainer
if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.
David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec)
to
Post by Rainer Gerhards
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above
prohibition
Post by Rainer Gerhards
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will
go
Post by Rainer Gerhards
to zero. This likely isn't what people want from a syslog server.
There is already a way to get syslog data from one system to
the syslog protocol. Using NFS as a replacement for syslog is wrong
on
Post by Rainer Gerhards
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:53:32 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Friday, October 23, 2009 9:18 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so
I do
Post by Rainer Gerhards
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?
No, he said NFS, at least in the last message (not sure about the beginning
of this story...)

Rainer
if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.
David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec)
to
Post by Rainer Gerhards
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above
prohibition
Post by Rainer Gerhards
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will
go
Post by Rainer Gerhards
to zero. This likely isn't what people want from a syslog server.
There is already a way to get syslog data from one system to
the syslog protocol. Using NFS as a replacement for syslog is wrong
on
Post by Rainer Gerhards
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:53:32 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Friday, October 23, 2009 9:18 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so
I do
Post by Rainer Gerhards
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?
No, he said NFS, at least in the last message (not sure about the beginning
of this story...)

Rainer
if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.
David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec)
to
Post by Rainer Gerhards
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above
prohibition
Post by Rainer Gerhards
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will
go
Post by Rainer Gerhards
to zero. This likely isn't what people want from a syslog server.
There is already a way to get syslog data from one system to
the syslog protocol. Using NFS as a replacement for syslog is wrong
on
Post by Rainer Gerhards
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Alan T DeKok
2009-10-23 07:42:22 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.

Any other method will have problems. Even NetApp systems have issues
from time to time.

Alan DeKok.
david
2009-10-23 08:52:02 UTC
Permalink
Post by Alan T DeKok
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.
Any other method will have problems. Even NetApp systems have issues
from time to time.
I've found that for every situation I've ever heard someone (including me)
say 'never do that' that there is _some_ situaton where 'that' is exactly
the right thing to do ;-)

I agree that in almost all cases, writing logs to NFS is not a good idea.

you really will need a high-end NFS server to have any chance of it
working (and the load that the logging will put on the server is going to
be significant, unless something does caching, which is what is causing
your problems) so this is a _very_ expensive way to go to make logging
work.

if you are putting the logs on NFS to make them readable by a bunch of
machines, consider running rsyslog on a linux system and let that linux
system export the log directory via NFS. for read-only access from many
machines, this is probably going to rivel the speed of a very expensive
NFS server, while being much simpler, cheaper, and more reliable (if you
need redundant hardware things get more complicated, but it's still not
hard to make it reliable and it will definantly be much cheaper)

David Lang
Aaron Nichols
2009-10-23 13:22:24 UTC
Permalink
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.

The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.

Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise. The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.

I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.

I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.

Thanks again for all the information.

Aaron
Rainer Gerhards
2009-10-23 13:31:18 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Friday, October 23, 2009 3:22 PM
To: rsyslog-users
Subject: Re: [rsyslog]
Enabling$ActionFileEnableSynccausedmassiveincreaseinwrite
volume(bytes/sec) to NAS -cansomeone helpshed light?
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.
The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.
Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise.
Note that the debug run is *much* slower than the actual run. It's the usual
problem, if you instrument the system you also affect it. Performance-wise,
we have seen that a debug build can be up to ten times slower (sometimes
more, depending on config), but even the regular debug log alone, if turned
on, heavily changes performance and order of execution. The root cause is the
many debug outputs, which require at least some mutex snyc and that obviously
affects the overall system...
The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.
From my POV, that actually doesn't help at all. The reason is that I don't
see any unusual in the log, and that means all I can see is that data is
being continously written...
I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.
... the strace will probably provide much better info, because here we see
the API calls. One thing I forgot to mention is that you should NOT use it
together with the debug log. The debug output api calls will make the strace
output unreadable.

For example, I have these line in one of my test scripts:

strace -T -tt -f -F tools/rsyslogd -c.....

I don't know out of my head what each option does, but that line may give you
a starting point. The end result is not a single line but rather a complete
log of each API called PLUS how much time the processing took. That, too, is
a quite enormous log...

Rainer
I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.
Thanks again for all the information.
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 13:31:18 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Friday, October 23, 2009 3:22 PM
To: rsyslog-users
Subject: Re: [rsyslog]
Enabling$ActionFileEnableSynccausedmassiveincreaseinwrite
volume(bytes/sec) to NAS -cansomeone helpshed light?
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.
The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.
Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise.
Note that the debug run is *much* slower than the actual run. It's the usual
problem, if you instrument the system you also affect it. Performance-wise,
we have seen that a debug build can be up to ten times slower (sometimes
more, depending on config), but even the regular debug log alone, if turned
on, heavily changes performance and order of execution. The root cause is the
many debug outputs, which require at least some mutex snyc and that obviously
affects the overall system...
The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.
From my POV, that actually doesn't help at all. The reason is that I don't
see any unusual in the log, and that means all I can see is that data is
being continously written...
I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.
... the strace will probably provide much better info, because here we see
the API calls. One thing I forgot to mention is that you should NOT use it
together with the debug log. The debug output api calls will make the strace
output unreadable.

For example, I have these line in one of my test scripts:

strace -T -tt -f -F tools/rsyslogd -c.....

I don't know out of my head what each option does, but that line may give you
a starting point. The end result is not a single line but rather a complete
log of each API called PLUS how much time the processing took. That, too, is
a quite enormous log...

Rainer
I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.
Thanks again for all the information.
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 13:31:18 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Friday, October 23, 2009 3:22 PM
To: rsyslog-users
Subject: Re: [rsyslog]
Enabling$ActionFileEnableSynccausedmassiveincreaseinwrite
volume(bytes/sec) to NAS -cansomeone helpshed light?
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.
The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.
Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise.
Note that the debug run is *much* slower than the actual run. It's the usual
problem, if you instrument the system you also affect it. Performance-wise,
we have seen that a debug build can be up to ten times slower (sometimes
more, depending on config), but even the regular debug log alone, if turned
on, heavily changes performance and order of execution. The root cause is the
many debug outputs, which require at least some mutex snyc and that obviously
affects the overall system...
The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.
From my POV, that actually doesn't help at all. The reason is that I don't
see any unusual in the log, and that means all I can see is that data is
being continously written...
I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.
... the strace will probably provide much better info, because here we see
the API calls. One thing I forgot to mention is that you should NOT use it
together with the debug log. The debug output api calls will make the strace
output unreadable.

For example, I have these line in one of my test scripts:

strace -T -tt -f -F tools/rsyslogd -c.....

I don't know out of my head what each option does, but that line may give you
a starting point. The end result is not a single line but rather a complete
log of each API called PLUS how much time the processing took. That, too, is
a quite enormous log...

Rainer
I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.
Thanks again for all the information.
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 13:31:18 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Friday, October 23, 2009 3:22 PM
To: rsyslog-users
Subject: Re: [rsyslog]
Enabling$ActionFileEnableSynccausedmassiveincreaseinwrite
volume(bytes/sec) to NAS -cansomeone helpshed light?
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.
The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.
Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise.
Note that the debug run is *much* slower than the actual run. It's the usual
problem, if you instrument the system you also affect it. Performance-wise,
we have seen that a debug build can be up to ten times slower (sometimes
more, depending on config), but even the regular debug log alone, if turned
on, heavily changes performance and order of execution. The root cause is the
many debug outputs, which require at least some mutex snyc and that obviously
affects the overall system...
The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.
From my POV, that actually doesn't help at all. The reason is that I don't
see any unusual in the log, and that means all I can see is that data is
being continously written...
I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.
... the strace will probably provide much better info, because here we see
the API calls. One thing I forgot to mention is that you should NOT use it
together with the debug log. The debug output api calls will make the strace
output unreadable.

For example, I have these line in one of my test scripts:

strace -T -tt -f -F tools/rsyslogd -c.....

I don't know out of my head what each option does, but that line may give you
a starting point. The end result is not a single line but rather a complete
log of each API called PLUS how much time the processing took. That, too, is
a quite enormous log...

Rainer
I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.
Thanks again for all the information.
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-23 13:22:24 UTC
Permalink
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.

The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.

Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise. The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.

I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.

I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.

Thanks again for all the information.

Aaron
Aaron Nichols
2009-10-23 13:22:24 UTC
Permalink
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.

The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.

Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise. The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.

I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.

I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.

Thanks again for all the information.

Aaron
Aaron Nichols
2009-10-23 13:22:24 UTC
Permalink
Thank you everyone for your responses - rather than reply to each
post, here is some feedback.

The storage system - by all measure we have a "high end" system - It's
an EMC Clarion CX3-80 attached to a Celerra NS80 NAS head with 10gb
interfaces. The network file system is NFS, the Celerra can also
present CIFS shares but I doubt those would work any better. This is
not easily replaced with a linux box so that I can do rsync. The logs
must be made available to multiple hosts and our legacy implementation
*is* a Linux box writing to direct attached disk sharing NFS... it is
crumbling under our write/read load so we had to come up with another
solution. This solution is understood to be a bit of a compromise -
the ideal situation would be direct attached fiberchannel disk and a
cluster filesystem but that wasn't an option - mostly for cost. For
all the complaints about NFS it performs just fine - if we enable file
sync the write load just goes through the roof, but the NAS keeps up.

Client bufferring (or throttling) is still possible however because we
are transitioning to this new system many of these systems are
dual-logging to two destinations so their logging behavior can be
observed on two different systems. On the old system running syslog-ng
these messages are written to disk and visible near real-time and they
don't seem to be doing any rate limiting or otherwise. The debug log
that was taken however had file sync enable - and when that's enabled
this delay is not seen so I'm not suprised it wasn't visible. I can
produce a debug log with file sync disabled for comparison, that's
probably something I should have done in the first place.

I have tried an strace on the rsyslog process(es) when running
normally and haven't been able to get much out of them but that could
be my own limited knowledge of strace. `strace -p <pid>` yields a line
of information and then just stops, even though the rsyslog process is
still working fine and is obviously doing something. I've tried
attaching to the different threads as well with the same result.

I will see if I can get a comparison debug log with file sync disabled
& if there is some guidance on getting a good strace I'm happy to
provide that output. Yesterday we had to move the server from RHEL 5.3
to RHEL 5.1 due to some discovered limits in tcp connections - either
intentionally or unintentionally different in our installation of 5.3.
The server is now performing (with file sync disabled) at an even
higher capacity on RH5.1 however I haven't had a chance to review
whether the write latency is still observed - hopefully today I can
look at that.

Thanks again for all the information.

Aaron
david
2009-10-23 08:52:02 UTC
Permalink
Post by Alan T DeKok
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.
Any other method will have problems. Even NetApp systems have issues
from time to time.
I've found that for every situation I've ever heard someone (including me)
say 'never do that' that there is _some_ situaton where 'that' is exactly
the right thing to do ;-)

I agree that in almost all cases, writing logs to NFS is not a good idea.

you really will need a high-end NFS server to have any chance of it
working (and the load that the logging will put on the server is going to
be significant, unless something does caching, which is what is causing
your problems) so this is a _very_ expensive way to go to make logging
work.

if you are putting the logs on NFS to make them readable by a bunch of
machines, consider running rsyslog on a linux system and let that linux
system export the log directory via NFS. for read-only access from many
machines, this is probably going to rivel the speed of a very expensive
NFS server, while being much simpler, cheaper, and more reliable (if you
need redundant hardware things get more complicated, but it's still not
hard to make it reliable and it will definantly be much cheaper)

David Lang
david
2009-10-23 08:52:02 UTC
Permalink
Post by Alan T DeKok
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.
Any other method will have problems. Even NetApp systems have issues
from time to time.
I've found that for every situation I've ever heard someone (including me)
say 'never do that' that there is _some_ situaton where 'that' is exactly
the right thing to do ;-)

I agree that in almost all cases, writing logs to NFS is not a good idea.

you really will need a high-end NFS server to have any chance of it
working (and the load that the logging will put on the server is going to
be significant, unless something does caching, which is what is causing
your problems) so this is a _very_ expensive way to go to make logging
work.

if you are putting the logs on NFS to make them readable by a bunch of
machines, consider running rsyslog on a linux system and let that linux
system export the log directory via NFS. for read-only access from many
machines, this is probably going to rivel the speed of a very expensive
NFS server, while being much simpler, cheaper, and more reliable (if you
need redundant hardware things get more complicated, but it's still not
hard to make it reliable and it will definantly be much cheaper)

David Lang
david
2009-10-23 08:52:02 UTC
Permalink
Post by Alan T DeKok
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.
Any other method will have problems. Even NetApp systems have issues
from time to time.
I've found that for every situation I've ever heard someone (including me)
say 'never do that' that there is _some_ situaton where 'that' is exactly
the right thing to do ;-)

I agree that in almost all cases, writing logs to NFS is not a good idea.

you really will need a high-end NFS server to have any chance of it
working (and the load that the logging will put on the server is going to
be significant, unless something does caching, which is what is causing
your problems) so this is a _very_ expensive way to go to make logging
work.

if you are putting the logs on NFS to make them readable by a bunch of
machines, consider running rsyslog on a linux system and let that linux
system export the log directory via NFS. for read-only access from many
machines, this is probably going to rivel the speed of a very expensive
NFS server, while being much simpler, cheaper, and more reliable (if you
need redundant hardware things get more complicated, but it's still not
hard to make it reliable and it will definantly be much cheaper)

David Lang
david
2009-10-23 07:17:43 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?

if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.

David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Alan T DeKok
2009-10-23 07:42:22 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.

Any other method will have problems. Even NetApp systems have issues
from time to time.

Alan DeKok.
david
2009-10-23 07:17:43 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?

if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.

David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Alan T DeKok
2009-10-23 07:42:22 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.

Any other method will have problems. Even NetApp systems have issues
from time to time.

Alan DeKok.
david
2009-10-23 07:17:43 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
ahh, have we been assuming that this is NFS when all he said was NAS?

if so, please clarify what protocol you are using to talk to the NAS. it
could make a huge difference here.

David Lang
Post by Rainer Gerhards
Other than that, I agree to your statement.
Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Alan T DeKok
2009-10-23 07:42:22 UTC
Permalink
Post by Rainer Gerhards
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.
Configure rsyslog to write to local disk, and rsync periodically to
the network share. Or, use another instance of rsyslog to copy from
local disk to the network share.

Any other method will have problems. Even NetApp systems have issues
from time to time.

Alan DeKok.
Rainer Gerhards
2009-10-23 07:03:05 UTC
Permalink
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.

Other than that, I agree to your statement.

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:03:05 UTC
Permalink
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.

Other than that, I agree to your statement.

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-23 07:03:05 UTC
Permalink
I think the main point of the OP was that he logs to a NAS device, so I do
not see way to write to it without going through a network share.

Other than that, I agree to your statement.

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Alan T DeKok
Sent: Friday, October 23, 2009 8:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.
David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.
If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.
Alan DeKok.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
david
2009-10-23 06:32:05 UTC
Permalink
Post by Rainer Gerhards
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.
So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
this reminded me of something.

a standard complient NFS implementation would require that any write to a
NFS volume go all the way to the destination and be safe on disk before
the write call returns to the calling program.

this doesn't mean when you do a fsync, it means each individual write
call.

for the program to think the data has been written and it not be available
on the NFS drive to other systems there has to be some non-standard
caching in place here.

note that unless you have a high-end NFS server that includes some
battery-backed ram on it, standard complient behavior also means a very
low write transaction rate.

note that the caching could be on the reader machines instead of on the
writer, and it could be something like caching of the metadata (including
size and last modified timestamp)

David Lang
Alan T DeKok
2009-10-23 06:39:35 UTC
Permalink
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.

David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.

If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.

There is already a way to get syslog data from one system to another:
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.

Alan DeKok.
david
2009-10-23 06:32:05 UTC
Permalink
Post by Rainer Gerhards
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.
So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
this reminded me of something.

a standard complient NFS implementation would require that any write to a
NFS volume go all the way to the destination and be safe on disk before
the write call returns to the calling program.

this doesn't mean when you do a fsync, it means each individual write
call.

for the program to think the data has been written and it not be available
on the NFS drive to other systems there has to be some non-standard
caching in place here.

note that unless you have a high-end NFS server that includes some
battery-backed ram on it, standard complient behavior also means a very
low write transaction rate.

note that the caching could be on the reader machines instead of on the
writer, and it could be something like caching of the metadata (including
size and last modified timestamp)

David Lang
Alan T DeKok
2009-10-23 06:39:35 UTC
Permalink
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.

David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.

If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.

There is already a way to get syslog data from one system to another:
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.

Alan DeKok.
david
2009-10-23 06:32:05 UTC
Permalink
Post by Rainer Gerhards
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.
So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
this reminded me of something.

a standard complient NFS implementation would require that any write to a
NFS volume go all the way to the destination and be safe on disk before
the write call returns to the calling program.

this doesn't mean when you do a fsync, it means each individual write
call.

for the program to think the data has been written and it not be available
on the NFS drive to other systems there has to be some non-standard
caching in place here.

note that unless you have a high-end NFS server that includes some
battery-backed ram on it, standard complient behavior also means a very
low write transaction rate.

note that the caching could be on the reader machines instead of on the
writer, and it could be something like caching of the metadata (including
size and last modified timestamp)

David Lang
Alan T DeKok
2009-10-23 06:39:35 UTC
Permalink
Post by Rainer Gerhards
So my conclusion really is that you need to look at the NFS layer.
Logs should NEVER be written to an NFS mount.

David gave a good explanation as to why. But the above prohibition
should be made clear in the docs, if it isn't already.

If the NFS server goes away, then the write transaction rate will go
to zero. This likely isn't what people want from a syslog server.

There is already a way to get syslog data from one system to another:
the syslog protocol. Using NFS as a replacement for syslog is wrong on
many levels.

Alan DeKok.
Rainer Gerhards
2009-10-23 06:21:01 UTC
Permalink
Aaron,

I am replying on-list so that others can follow (and I hope they have some
advise). I have stripped the sensitive part of your message. In short for all
others: I got a debug log and have now reviewed it.
During this debug run we saw NAS write volume increase from a "normal"
2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the
test. I do understand how enabling file sync will increase iops but
I'm unclear on why the volume of data being written would increase so
much. If rsyslog has file sync "disabled" then I would expect bursts
of data and if file sync is "enabled" I would expect the same amount
of data to be written in much more frequent, but also much smaller
chunks. NFS could be a culprit here, but I'm curious what NFS
parameters would impact this behavior.
The debug log does not show anything that looks specifically suspect. The
incoming traffic is intense and the default queue size is overrun quickly,
resulting in voluntary delays of the sender (via tcp). Looking at the
traffic, I think this is perfectly legal and increasing the queue size would
not bring any benefit (I think the end result would just be a larger memory
footprint without any advantage).

I do not see any unusual write behavior. Of course, the debug log could be
more verbose as it does not specifically tell about the syncs (usually, the
log is missing some information you'd like to see for a given instant, but
you can't include everything...). But looking at the code base plus the log
really does not reveal anything suspicious.

One thing you might want to try is doing an strace so that we can look at it
and see which exact OS calls are made and how long they last.
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.

So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
I have looked at the option
$OMFileFlushOnTXEnd and wonder if it would be a preferred method to
resolve this in the 4.1.1 version (or if it even works there).
This makes no sense for v4 (I think it is not even available).

Sorry I have no better answer. If get a strace, I'd gladly look at it.

Rainer
Rainer Gerhards
2009-10-23 06:21:01 UTC
Permalink
Aaron,

I am replying on-list so that others can follow (and I hope they have some
advise). I have stripped the sensitive part of your message. In short for all
others: I got a debug log and have now reviewed it.
During this debug run we saw NAS write volume increase from a "normal"
2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the
test. I do understand how enabling file sync will increase iops but
I'm unclear on why the volume of data being written would increase so
much. If rsyslog has file sync "disabled" then I would expect bursts
of data and if file sync is "enabled" I would expect the same amount
of data to be written in much more frequent, but also much smaller
chunks. NFS could be a culprit here, but I'm curious what NFS
parameters would impact this behavior.
The debug log does not show anything that looks specifically suspect. The
incoming traffic is intense and the default queue size is overrun quickly,
resulting in voluntary delays of the sender (via tcp). Looking at the
traffic, I think this is perfectly legal and increasing the queue size would
not bring any benefit (I think the end result would just be a larger memory
footprint without any advantage).

I do not see any unusual write behavior. Of course, the debug log could be
more verbose as it does not specifically tell about the syncs (usually, the
log is missing some information you'd like to see for a given instant, but
you can't include everything...). But looking at the code base plus the log
really does not reveal anything suspicious.

One thing you might want to try is doing an strace so that we can look at it
and see which exact OS calls are made and how long they last.
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.

So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
I have looked at the option
$OMFileFlushOnTXEnd and wonder if it would be a preferred method to
resolve this in the 4.1.1 version (or if it even works there).
This makes no sense for v4 (I think it is not even available).

Sorry I have no better answer. If get a strace, I'd gladly look at it.

Rainer
Rainer Gerhards
2009-10-23 06:21:01 UTC
Permalink
Aaron,

I am replying on-list so that others can follow (and I hope they have some
advise). I have stripped the sensitive part of your message. In short for all
others: I got a debug log and have now reviewed it.
During this debug run we saw NAS write volume increase from a "normal"
2-3mbps and 35 iops up to 500 iops and 5.6mbps toward the end of the
test. I do understand how enabling file sync will increase iops but
I'm unclear on why the volume of data being written would increase so
much. If rsyslog has file sync "disabled" then I would expect bursts
of data and if file sync is "enabled" I would expect the same amount
of data to be written in much more frequent, but also much smaller
chunks. NFS could be a culprit here, but I'm curious what NFS
parameters would impact this behavior.
The debug log does not show anything that looks specifically suspect. The
incoming traffic is intense and the default queue size is overrun quickly,
resulting in voluntary delays of the sender (via tcp). Looking at the
traffic, I think this is perfectly legal and increasing the queue size would
not bring any benefit (I think the end result would just be a larger memory
footprint without any advantage).

I do not see any unusual write behavior. Of course, the debug log could be
more verbose as it does not specifically tell about the syncs (usually, the
log is missing some information you'd like to see for a given instant, but
you can't include everything...). But looking at the code base plus the log
really does not reveal anything suspicious.

One thing you might want to try is doing an strace so that we can look at it
and see which exact OS calls are made and how long they last.
For the record, changing the sync behavior of rsyslog DOES fix the
original complaint. When we have file sync disabled we see up to 30
second pauses between writes to files which get multiple messages per
second delivered to the rsyslog host.
There is nothing in the log that points to rsyslog doing these delays.

So my conclusion really is that you need to look at the NFS layer. To me, it
looks like NFS does some local caching, and maybe 30 seconds is the timeout
at which NFS writes data from files that are constantly being written in
short intervals. With sync, NFS probably writes and rewrites each block as
David has said, thus resulting in the far larger i/o footprint. But I am far
from being a NFS expert, so it would probably make sense to ask one ;)
I have looked at the option
$OMFileFlushOnTXEnd and wonder if it would be a preferred method to
resolve this in the 4.1.1 version (or if it even works there).
This makes no sense for v4 (I think it is not even available).

Sorry I have no better answer. If get a strace, I'd gladly look at it.

Rainer
Rainer Gerhards
2009-10-21 15:32:59 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 4:25 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in
front
Post by Rainer Gerhards
of
all write statements? On some systems, write statements may be
included via
Post by Rainer Gerhards
files and if these are in front of the directive, those would be
unaffected
Post by Rainer Gerhards
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops
I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration.
I may be overlooking something here, but the check is very straightforward
and I don't see why some should be affected but others not. This is the file
(and spot) in question:

http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/omfile.c;h=bb12b4b61dacb
84167b9b0d17e49a597342dddc0;hb=07b076ddcfed4ea9b447d0be574d1dcdb799bc2f#l623

Do a search for "bEnableSync" to see all occurences.

But the debug log will show...

Rainer
Again,
I need to do some testing because at this point I'm just making guesses
without much information.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-21 15:32:59 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 4:25 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in
front
Post by Rainer Gerhards
of
all write statements? On some systems, write statements may be
included via
Post by Rainer Gerhards
files and if these are in front of the directive, those would be
unaffected
Post by Rainer Gerhards
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops
I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration.
I may be overlooking something here, but the check is very straightforward
and I don't see why some should be affected but others not. This is the file
(and spot) in question:

http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/omfile.c;h=bb12b4b61dacb
84167b9b0d17e49a597342dddc0;hb=07b076ddcfed4ea9b447d0be574d1dcdb799bc2f#l623

Do a search for "bEnableSync" to see all occurences.

But the debug log will show...

Rainer
Again,
I need to do some testing because at this point I'm just making guesses
without much information.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Rainer Gerhards
2009-10-21 15:32:59 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 4:25 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling
$ActionFileEnableSynccausedmassiveincreaseinwrite volume (bytes/sec) to
NAS -cansomeone helpshed light?
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in
front
Post by Rainer Gerhards
of
all write statements? On some systems, write statements may be
included via
Post by Rainer Gerhards
files and if these are in front of the directive, those would be
unaffected
Post by Rainer Gerhards
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops
I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration.
I may be overlooking something here, but the check is very straightforward
and I don't see why some should be affected but others not. This is the file
(and spot) in question:

http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/omfile.c;h=bb12b4b61dacb
84167b9b0d17e49a597342dddc0;hb=07b076ddcfed4ea9b447d0be574d1dcdb799bc2f#l623

Do a search for "bEnableSync" to see all occurences.

But the debug log will show...

Rainer
Again,
I need to do some testing because at this point I'm just making guesses
without much information.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-21 14:25:21 UTC
Permalink
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
changes. To uber-simplify my original email:

"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops

I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration. Again,
I need to do some testing because at this point I'm just making guesses
without much information.

Thanks,
Aaron
Aaron Nichols
2009-10-21 14:25:21 UTC
Permalink
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
changes. To uber-simplify my original email:

"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops

I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration. Again,
I need to do some testing because at this point I'm just making guesses
without much information.

Thanks,
Aaron
Aaron Nichols
2009-10-21 14:25:21 UTC
Permalink
On Wed, Oct 21, 2009 at 7:22 AM, Rainer Gerhards
Post by Rainer Gerhards
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...
We do also use includes but in this case $ActionFileEnableSync was at the
top of the included file with all write statements below it. I will also
note that the write volume changed significantly with two different config
changes. To uber-simplify my original email:

"normal" volume - $ActionFileEnableSync off & all actions include "-" in
front : 2-3mbps ~30 iops
$ActionFileEnableSync off & all actions exclude "-" in front : 5mbps (~100%
increase) (iops unknown)
$ActionFileEnableSync on & all actions exclude "-" in front : 9mbps
(~300-400% increase) ~900 iops

I am using a mix of static and dynamic filename actions and some actions use
format templates as well. I suppose there could be one of those combinations
which is not honoring the global $ActionFileEnableSync configuration. Again,
I need to do some testing because at this point I'm just making guesses
without much information.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 13:22:29 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 3:18 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync
causedmassiveincreaseinwrite volume (bytes/sec) to NAS - cansomeone
helpshed light?
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The
presence
Post by Rainer Gerhards
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason
why *a
Post by Rainer Gerhards
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...

Rainer
Rainer Gerhards
2009-10-21 13:22:29 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 3:18 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync
causedmassiveincreaseinwrite volume (bytes/sec) to NAS - cansomeone
helpshed light?
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The
presence
Post by Rainer Gerhards
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason
why *a
Post by Rainer Gerhards
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...

Rainer
Rainer Gerhards
2009-10-21 13:22:29 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 3:18 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync
causedmassiveincreaseinwrite volume (bytes/sec) to NAS - cansomeone
helpshed light?
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The
presence
Post by Rainer Gerhards
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason
why *a
Post by Rainer Gerhards
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)
That indeed sounds odd. Are you sure the $ActionFileEnableSync is in front of
all write statements? On some systems, write statements may be included via
files and if these are in front of the directive, those would be unaffected
by it...

Rainer
Aaron Nichols
2009-10-21 13:17:49 UTC
Permalink
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)

This led to my original suspicion that the two options somehow change sync
behavior independently. If they shouldn't, then I need to get the debug log
and do some analysis.

I appreciate the guidance - I'll gather some more details.

Thanks,
Aaron
Aaron Nichols
2009-10-21 13:17:49 UTC
Permalink
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)

This led to my original suspicion that the two options somehow change sync
behavior independently. If they shouldn't, then I need to get the debug log
and do some analysis.

I appreciate the guidance - I'll gather some more details.

Thanks,
Aaron
Aaron Nichols
2009-10-21 13:17:49 UTC
Permalink
On Wed, Oct 21, 2009 at 1:38 AM, Rainer Gerhards
Post by Rainer Gerhards
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in
*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync
fileWithSync is being synced and fileWithoutSync is not!
This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...
I understood correctly and now I'm certain I know how it should work. I need
to do some more testing and I think I have enough information to do that and
will come back here when I have something interesting to share. An
observation I did make when this was happening was that if
$ActionFileEnableSync was disabled (commented out) but the "-" was missing
before the actions in the configuration, write volume to the NAS was higher
than when we had $ActionFileEnableSync disabled and had the "-" added before
the action (write volume was 100% higher if $ActionFileEnableSync was OFF
and the "-" option was missing from all log actions than if
$ActionFileEnableSync was OFF and "-" was included in all log actions.)

This led to my original suspicion that the two options somehow change sync
behavior independently. If they shouldn't, then I need to get the debug log
and do some analysis.

I appreciate the guidance - I'll gather some more details.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 07:38:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 10:38 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincreaseinwrite volume (bytes/sec) to NAS - can someone helpshed
light?
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in

*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync

fileWithSync is being synced and fileWithoutSync is not!

This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...

Rainer
Rainer Gerhards
2009-10-21 07:38:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 10:38 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincreaseinwrite volume (bytes/sec) to NAS - can someone helpshed
light?
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in

*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync

fileWithSync is being synced and fileWithoutSync is not!

This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...

Rainer
Rainer Gerhards
2009-10-21 07:38:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 10:38 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincreaseinwrite volume (bytes/sec) to NAS - can someone helpshed
light?
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?
See my other note, but there may be a subtle misunderstanding: The presence
of "-" turns OFF file syncing. Absence of "-" causes a sync! So in

*.* /path/to/fileWithSync
*.* -/path/to/fileWithoutSync

fileWithSync is being synced and fileWithoutSync is not!

This is traditional config file syntax and probably the main reason why *a
lot* of real-world config files do syncs...

Rainer
Aaron Nichols
2009-10-20 20:37:43 UTC
Permalink
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?

If I have "$ActionFileEnableSync off" set - should I expect adding or
removing "-" before an action to have any impact at all? The documentation
just associates these two options with enabling/disabling "file sync" but it
seems like there are some additional subtleties that aren't clear.

Thanks,
Aaron
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards <
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Aaron Nichols
2009-10-20 20:37:43 UTC
Permalink
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?

If I have "$ActionFileEnableSync off" set - should I expect adding or
removing "-" before an action to have any impact at all? The documentation
just associates these two options with enabling/disabling "file sync" but it
seems like there are some additional subtleties that aren't clear.

Thanks,
Aaron
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards <
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Aaron Nichols
2009-10-20 20:37:43 UTC
Permalink
Also - can someone just confirm that my assumption is correct about the
expected behavior? Should the "$ActionFileEnableSync [on/off]" option enable
the ABILITY to turn on file sync per action using "-" or does it do
something different?

If I have "$ActionFileEnableSync off" set - should I expect adding or
removing "-" before an action to have any impact at all? The documentation
just associates these two options with enabling/disabling "file sync" but it
seems like there are some additional subtleties that aren't clear.

Thanks,
Aaron
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards <
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Aaron Nichols
2009-10-20 18:41:44 UTC
Permalink
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Here are the NFS mount options for the nodes writing to the NFS volume. The
NFS server is an EMC Celerra - I can get any specific parameters there are
questions about but can't easily provide a dump of the configured options:

hostname:/log/syslog /log/syslog nfs
rw,vers=3,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=hostname
0 0

I will work on a debug log but as these are production systems it'll take a
little time.

Thanks,
Aaron
Aaron Nichols
2009-10-20 18:41:44 UTC
Permalink
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Here are the NFS mount options for the nodes writing to the NFS volume. The
NFS server is an EMC Celerra - I can get any specific parameters there are
questions about but can't easily provide a dump of the configured options:

hostname:/log/syslog /log/syslog nfs
rw,vers=3,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=hostname
0 0

I will work on a debug log but as these are production systems it'll take a
little time.

Thanks,
Aaron
Aaron Nichols
2009-10-20 18:41:44 UTC
Permalink
On Tue, Oct 20, 2009 at 10:58 AM, Rainer Gerhards
Post by Rainer Gerhards
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.
Here are the NFS mount options for the nodes writing to the NFS volume. The
NFS server is an EMC Celerra - I can get any specific parameters there are
questions about but can't easily provide a dump of the configured options:

hostname:/log/syslog /log/syslog nfs
rw,vers=3,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=hostname
0 0

I will work on a debug log but as these are production systems it'll take a
little time.

Thanks,
Aaron
Rainer Gerhards
2009-10-20 16:58:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 6:54 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease inwrite volume (bytes/sec) to NAS - can someone
helpshed light?
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you
using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.

Rainer
Rainer Gerhards
2009-10-20 16:58:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 6:54 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease inwrite volume (bytes/sec) to NAS - can someone
helpshed light?
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you
using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.

Rainer
Rainer Gerhards
2009-10-20 16:58:19 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 6:54 PM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease inwrite volume (bytes/sec) to NAS - can someone
helpshed light?
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you
using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Hmmm... OK, so this is a version without output batching support (v5+) and
without background writer (current v4-beta). So I do not really see how
rsyslog could request this writing (other, of course, than by using the sync
calls). Could you create a debug log? I'd try to dig through it. But I'd
actually think that it has something to do with the way NFS parameters are
set.

Rainer
Aaron Nichols
2009-10-20 16:54:02 UTC
Permalink
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Aaron Nichols
2009-10-20 16:54:02 UTC
Permalink
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
Aaron Nichols
2009-10-20 16:54:02 UTC
Permalink
On Tue, Oct 20, 2009 at 10:32 AM, Rainer Gerhards
Post by Rainer Gerhards
Hi Aaron,
first and very important question: which version of rsyslog are you using?
Of course - I forgot to include that, sorry - 4.4.1. Running on RHEL 5.3.
david
2009-10-21 05:12:39 UTC
Permalink
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.

the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.

when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.

I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.

David Lang
Date: Tue, 20 Oct 2009 09:32:07 -0600
From: Aaron Nichols <anichols at trumped.org>
Reply-To: rsyslog-users <rsyslog at lists.adiscon.com>
To: rsyslog-users <rsyslog at lists.adiscon.com>
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive increase in
write volume (bytes/sec) to NAS - can someone help shed light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-21 05:28:48 UTC
Permalink
Post by david
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log destinations
other than that one high volume, large message size destination.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.

I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?

Thanks again for the feedback. I'm pushing for a support contract as I get
the feeling it'll come in handy as well and I want to support the project,
but I appreciate the feedback in this forum, I know everyone is hard at
work.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 07:33:22 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 7:29 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease in write volume (bytes/sec) to NAS - can someonehelp
shed light?
Post by david
hmm, how large are the log entries? I'm wondering if the fact that
every
Post by david
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log
destinations
other than that one high volume, large message size destination.
Sorry, I forgot to answer that in my posting yesterday. $ActionFileEnableSync
is position dependent. So I assume that it is giving right at the top of
rsyslog.conf, before any action. If set to on, it *enables* the sync
facility. So you can use the dash in front of the file to specify whether or
not a sync should happen. If it is set to "off" (the default), the
"dash-rule" is always ignored and a sync never happens. This was done because
many default configs have the sync option set just out of habit and the
performance toll, as you have seen, is immense.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these
writes as
Post by david
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you
configure it
Post by david
to output every X messages or Y ms, whichever comes first. doing a
sync
Post by david
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.
I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?
My primary interest in the debug log is to see if something works other than
I would expect it. If that is not the case, you need to dig down into what
NFS does to cause this behavior. Doing an strace may also be a good idea,
depending on the output of the debug log. The core problem from my POV is
that in 4.4.1 there is no code that could create the behavior you describe
(at least not to the best of my knowledge, and this is what the debug log
helps verify ;)).

Rainer
Rainer Gerhards
2009-10-21 07:33:22 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 7:29 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease in write volume (bytes/sec) to NAS - can someonehelp
shed light?
Post by david
hmm, how large are the log entries? I'm wondering if the fact that
every
Post by david
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log
destinations
other than that one high volume, large message size destination.
Sorry, I forgot to answer that in my posting yesterday. $ActionFileEnableSync
is position dependent. So I assume that it is giving right at the top of
rsyslog.conf, before any action. If set to on, it *enables* the sync
facility. So you can use the dash in front of the file to specify whether or
not a sync should happen. If it is set to "off" (the default), the
"dash-rule" is always ignored and a sync never happens. This was done because
many default configs have the sync option set just out of habit and the
performance toll, as you have seen, is immense.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these
writes as
Post by david
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you
configure it
Post by david
to output every X messages or Y ms, whichever comes first. doing a
sync
Post by david
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.
I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?
My primary interest in the debug log is to see if something works other than
I would expect it. If that is not the case, you need to dig down into what
NFS does to cause this behavior. Doing an strace may also be a good idea,
depending on the output of the debug log. The core problem from my POV is
that in 4.4.1 there is no code that could create the behavior you describe
(at least not to the best of my knowledge, and this is what the debug log
helps verify ;)).

Rainer
Rainer Gerhards
2009-10-21 07:33:22 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 7:29 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease in write volume (bytes/sec) to NAS - can someonehelp
shed light?
Post by david
hmm, how large are the log entries? I'm wondering if the fact that
every
Post by david
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log
destinations
other than that one high volume, large message size destination.
Sorry, I forgot to answer that in my posting yesterday. $ActionFileEnableSync
is position dependent. So I assume that it is giving right at the top of
rsyslog.conf, before any action. If set to on, it *enables* the sync
facility. So you can use the dash in front of the file to specify whether or
not a sync should happen. If it is set to "off" (the default), the
"dash-rule" is always ignored and a sync never happens. This was done because
many default configs have the sync option set just out of habit and the
performance toll, as you have seen, is immense.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these
writes as
Post by david
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you
configure it
Post by david
to output every X messages or Y ms, whichever comes first. doing a
sync
Post by david
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.
I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?
My primary interest in the debug log is to see if something works other than
I would expect it. If that is not the case, you need to dig down into what
NFS does to cause this behavior. Doing an strace may also be a good idea,
depending on the output of the debug log. The core problem from my POV is
that in 4.4.1 there is no code that could create the behavior you describe
(at least not to the best of my knowledge, and this is what the debug log
helps verify ;)).

Rainer
Rainer Gerhards
2009-10-21 07:33:22 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Wednesday, October 21, 2009 7:29 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused
massiveincrease in write volume (bytes/sec) to NAS - can someonehelp
shed light?
Post by david
hmm, how large are the log entries? I'm wondering if the fact that
every
Post by david
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log
destinations
other than that one high volume, large message size destination.
Sorry, I forgot to answer that in my posting yesterday. $ActionFileEnableSync
is position dependent. So I assume that it is giving right at the top of
rsyslog.conf, before any action. If set to on, it *enables* the sync
facility. So you can use the dash in front of the file to specify whether or
not a sync should happen. If it is set to "off" (the default), the
"dash-rule" is always ignored and a sync never happens. This was done because
many default configs have the sync option set just out of habit and the
performance toll, as you have seen, is immense.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these
writes as
Post by david
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you
configure it
Post by david
to output every X messages or Y ms, whichever comes first. doing a
sync
Post by david
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.
I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?
My primary interest in the debug log is to see if something works other than
I would expect it. If that is not the case, you need to dig down into what
NFS does to cause this behavior. Doing an strace may also be a good idea,
depending on the output of the debug log. The core problem from my POV is
that in 4.4.1 there is no code that could create the behavior you describe
(at least not to the best of my knowledge, and this is what the debug log
helps verify ;)).

Rainer
Rainer Gerhards
2009-10-21 06:59:48 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Wednesday, October 21, 2009 7:13 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase in write volume (bytes/sec) to NAS - can someone help shed
light?
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
rsyslog does an fdatasync() to avoid this problem. However, that API is not
available on all platforms...

Rainer
Aaron Nichols
2009-10-21 05:28:48 UTC
Permalink
Post by david
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log destinations
other than that one high volume, large message size destination.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.

I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?

Thanks again for the feedback. I'm pushing for a support contract as I get
the feeling it'll come in handy as well and I want to support the project,
but I appreciate the feedback in this forum, I know everyone is hard at
work.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 06:59:48 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Wednesday, October 21, 2009 7:13 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase in write volume (bytes/sec) to NAS - can someone help shed
light?
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
rsyslog does an fdatasync() to avoid this problem. However, that API is not
available on all platforms...

Rainer
Aaron Nichols
2009-10-21 05:28:48 UTC
Permalink
Post by david
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log destinations
other than that one high volume, large message size destination.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.

I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?

Thanks again for the feedback. I'm pushing for a support contract as I get
the feeling it'll come in handy as well and I want to support the project,
but I appreciate the feedback in this forum, I know everyone is hard at
work.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 06:59:48 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Wednesday, October 21, 2009 7:13 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase in write volume (bytes/sec) to NAS - can someone help shed
light?
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
rsyslog does an fdatasync() to avoid this problem. However, that API is not
available on all platforms...

Rainer
Aaron Nichols
2009-10-21 05:28:48 UTC
Permalink
Post by david
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
Most log entries are small (under 2k) but there is one log destination which
produces very large log entries. We have had to increase the max message
size to 64k to accommodate these messages. This is only one destination but
comes from multiple sources (collated into a single file) - so with a better
understanding of how the "$ActionFileEnableSync" option vs. using "-" on a
particular action works, I may be able to enable sync for log destinations
other than that one high volume, large message size destination.
Post by david
the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.
when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.
I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.
I probably will enjoy v5 but am not yet ready to run it in production. I
need to find a good balance between log update frequency and keeping NAS
performance sane. I did pass this feedback along to our storage guys to see
if there is any tuning we can do around this.

I will also be gathering a debug log soon - awaiting approvals to run the
test. I assume the debug output would show some indication of where this
might be causing problems?

Thanks again for the feedback. I'm pushing for a support contract as I get
the feeling it'll come in handy as well and I want to support the project,
but I appreciate the feedback in this forum, I know everyone is hard at
work.

Thanks,
Aaron
Rainer Gerhards
2009-10-21 06:59:48 UTC
Permalink
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of david at lang.hm
Sent: Wednesday, October 21, 2009 7:13 AM
To: rsyslog-users
Subject: Re: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase in write volume (bytes/sec) to NAS - can someone help shed
light?
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.
rsyslog does an fdatasync() to avoid this problem. However, that API is not
available on all platforms...

Rainer
Aaron Nichols
2009-10-20 15:32:07 UTC
Permalink
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.

First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.

Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.

Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.

This leaves me with a few questions:

1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.

2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.

3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.

Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.

Let me know if I can add info that would help shed light on this.

Thanks,
Aaron
Rainer Gerhards
2009-10-20 16:32:30 UTC
Permalink
Hi Aaron,

first and very important question: which version of rsyslog are you using?

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 5:32 PM
To: rsyslog-users
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase inwrite volume (bytes/sec) to NAS - can someone help shed
light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is
connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of
granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
david
2009-10-21 05:12:39 UTC
Permalink
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.

the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.

when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.

I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.

David Lang
Date: Tue, 20 Oct 2009 09:32:07 -0600
From: Aaron Nichols <anichols at trumped.org>
Reply-To: rsyslog-users <rsyslog at lists.adiscon.com>
To: rsyslog-users <rsyslog at lists.adiscon.com>
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive increase in
write volume (bytes/sec) to NAS - can someone help shed light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-20 15:32:07 UTC
Permalink
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.

First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.

Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.

Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.

This leaves me with a few questions:

1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.

2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.

3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.

Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.

Let me know if I can add info that would help shed light on this.

Thanks,
Aaron
Rainer Gerhards
2009-10-20 16:32:30 UTC
Permalink
Hi Aaron,

first and very important question: which version of rsyslog are you using?

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 5:32 PM
To: rsyslog-users
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase inwrite volume (bytes/sec) to NAS - can someone help shed
light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is
connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of
granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
david
2009-10-21 05:12:39 UTC
Permalink
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.

the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.

when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.

I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.

David Lang
Date: Tue, 20 Oct 2009 09:32:07 -0600
From: Aaron Nichols <anichols at trumped.org>
Reply-To: rsyslog-users <rsyslog at lists.adiscon.com>
To: rsyslog-users <rsyslog at lists.adiscon.com>
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive increase in
write volume (bytes/sec) to NAS - can someone help shed light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Aaron Nichols
2009-10-20 15:32:07 UTC
Permalink
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.

First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.

Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.

Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.

This leaves me with a few questions:

1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.

2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.

3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.

Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.

Let me know if I can add info that would help shed light on this.

Thanks,
Aaron
Rainer Gerhards
2009-10-20 16:32:30 UTC
Permalink
Hi Aaron,

first and very important question: which version of rsyslog are you using?

Rainer
-----Original Message-----
From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
bounces at lists.adiscon.com] On Behalf Of Aaron Nichols
Sent: Tuesday, October 20, 2009 5:32 PM
To: rsyslog-users
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive
increase inwrite volume (bytes/sec) to NAS - can someone help shed
light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is
connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of
granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
david
2009-10-21 05:12:39 UTC
Permalink
hmm, how large are the log entries? I'm wondering if the fact that every
log entry then updates the metadata (size, time modified, etc) could
account for most, if not all, of your extra traffic.

the other thing that you could be seeing is if the NFS server records
write traffic based on # of blocks written * block size.

when doing a sync after each write you will do a lot of blocks (but
re-write the same block many times), if it counts each of these writes as
full block that would greatly magnify the write totals.

I think you will really like the message batching in v5. you configure it
to output every X messages or Y ms, whichever comes first. doing a sync
after every write will have _far_ less effect.

David Lang
Date: Tue, 20 Oct 2009 09:32:07 -0600
From: Aaron Nichols <anichols at trumped.org>
Reply-To: rsyslog-users <rsyslog at lists.adiscon.com>
To: rsyslog-users <rsyslog at lists.adiscon.com>
Subject: [rsyslog] Enabling $ActionFileEnableSync caused massive increase in
write volume (bytes/sec) to NAS - can someone help shed light?
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Loading...