Discussion:
[systemd-devel] Systemd Journald and audit logging causing journal issues
Brad Zynda
2017-12-01 14:05:19 UTC
Permalink
Hey Lennart,

Just wanted to get your thoughts on this before we break the link..

also can you provide proper direction or a howto for breaking the link
between auditd and journald?

Thanks,
Brad
Hey Steve,
Just wanted to follow up on this and say we are still seeing services
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable
basically created a script to check all unit file services/targets and
grep status -l for Journal, it is effecting a large range of
service.target's, service.service's and service.socket's
If we restart the service or reboot we no longer see those messages
about journal and everything appears to run as expected.
I have never looked at journald code and have no idea how it works or
why it
cares about audit events. My advice last email was to break the link if its
causing problems.

-Steve
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F
auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F
auid>=1000
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you
should
show you what is causing the events and to which files. This
information
may be used to create some kind of "never" rule that limits what gets
audited. If you do create some exclusion rule, it needs to be above the
perm_mod rules because audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
Well, the idea is to look for something that's getting hit a lot. What it
sounds like is that things are getting deleted and modified quite a bit
all
over the place. Does the executable report show a pattern such as one
application doing a lot? For example, with the rule you have, doing a yum
update will delete a whole lot of stuff and modify a whole lot of stuff.
Its possible to put exceptions in the rules so that one program does not
flood the logs. But looking at the quantities below, the audit system can
easily handle that.
Its also possible to exclude directories from auditing if the pattern is
that you have a daemon receiving and modifying files and then deleting
them. You should look at the executables & files and see if you can do
with auditing what they are doing because its not interesting.
If this is causing you problems on the journald side where its causing
other tasks to fail. Then I'd break the link between auditd and journald.
The amount of event you show is highish but well within range of what
auditd can do. Just make sure flush is set to incremental_async and freq
is 100 or 200. You should be OK.
-Steve
Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins
Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr
Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat
so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..
Lennart Poettering
2017-12-01 17:11:11 UTC
Permalink
Post by Brad Zynda
Hey Lennart,
Heya,
Post by Brad Zynda
Just wanted to get your thoughts on this before we break the link..
On what precisely? I am not sure what the original issue is, can you
summarize this briefly here?
Post by Brad Zynda
also can you provide proper direction or a howto for breaking the link
between auditd and journald?
auditd and journald aren't linked. Both get the audit data directly
from the kernel. Either can run fine without the other. To turn off
that journald picks up audit data from the kernel use "systemctl mask
systemd-journald-audit.socket" (and reboot), which will turn off all
audit data collection by journald.

But again, I am not entirely sure what the original issue is?

Lennart
--
Lennart Poettering, Red Hat
Brad Zynda
2017-12-01 17:19:28 UTC
Permalink
Post by Lennart Poettering
Post by Brad Zynda
Hey Lennart,
Heya,
Post by Brad Zynda
Just wanted to get your thoughts on this before we break the link..
On what precisely? I am not sure what the original issue is, can you
summarize this briefly here?
Absolutely, here is the previous email chain:



Hello Everyone,

I am sending along an issue brought to the systemd-journald dev list
Post by Lennart Poettering
Post by Brad Zynda
Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
The question is: why does auditd even log to the journal?
Post by Brad Zynda
Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?
journald detected that it got flooded with too many messages in too
short a time from auditd. if this happens then something is almost
certainly off with auditd, as auditd is not supposed to flood journald
with messages, after all it maintains its own auditing log database.
Please ping the auditd folks for help
Lennart
_______________________________________________
systemd-devel mailing list
systemd-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel





Hey Everyone,

Not sure if this is a bug so:

systemctl status -l systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service;
static; vendor preset: disabled)
Active: active (running) since Tue 2017-09-26 20:01:16 UTC; 5 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 565 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─565 /usr/lib/systemd/systemd-journald

Sep 28 13:50:03 server systemd-journal[565]: Suppressed 73244 messages
from /system.slice/auditd.service
Sep 28 13:51:03 server systemd-journal[565]: Suppressed 98979 messages
from /system.slice/auditd.service
Sep 28 13:52:03 server systemd-journal[565]: Suppressed 109433 messages
from /system.slice/auditd.service
Sep 28 13:53:03 server systemd-journal[565]: Suppressed 99788 messages
from /system.slice/auditd.service
Sep 28 13:54:03 server systemd-journal[565]: Suppressed 111605 messages
from /system.slice/auditd.service
Sep 28 13:55:03 server systemd-journal[565]: Suppressed 111591 messages
from /system.slice/auditd.service
Sep 28 13:56:03 server systemd-journal[565]: Suppressed 107947 messages
from /system.slice/auditd.service
Sep 28 13:57:51 server systemd-journal[565]: Suppressed 32760 messages
from /system.slice/auditd.service
Sep 28 17:21:40 server systemd-journal[565]: Suppressed 210 messages
from /system.slice/auditd.service
Oct 01 02:16:01 server systemd-journal[565]: Suppressed 1333 messages
from /system.slice/auditd.service



journalctl --verify
PASS: /run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/system.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000097f6c7-0005596b745b4d1c.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000096a587-00055966f35ae59a.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000009554f1-000559629c4cdb7e.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000940591-0005595e1811a2d1.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-000000000092b500-00055959f2de5ede.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000916479-0005595573137b74.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000901337-00055950d80cc3d8.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008ec2fb-0005594cad14b07a.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008d7373-0005594838683e58.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008c238e-00055943fe2072e3.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-00000000008ad1d9-0005593ff64a4f69.journal
PASS:
/run/log/journal/d28b0080ffe0432a974f36e4fb4bfa9b/***@0d49221d68d04ef0b95d8203c5e96a46-0000000000897f32-0005593e18c5758b.journal


journalctl --disk-usage
Archived and active journals take up 1.1G on disk.


Initially we saw:
16733 PATH
5070 SYSCALL
5024 CWD
3765 AVC
323 CRYPTO_KEY_USER
223 USER_START
222 USER_ACCT
222 CRED_ACQ
220 LOGIN
220 CRED_REFR
218 USER_END
218 CRED_DISP
46 USER_LOGIN
12 EXECVE
4 USER_AUTH
2 CRYPTO_SESSION
1 USER_ROLE_CHANGE
1 USER_CMD
1 SERVICE_STOP
1 SERVICE_START
1 BPRM_FCAPS

so we blocked type PATH in audit.rules

But we are still seeing 100K of dropped/suppressed messages.

Note: systemloglevel = INFO

Centos 7 1708 3.10.0-693.2.2.el7.x86_64

systemd.x86_64 219-42.el7_4.1


Now we are required to have full audit rules and does this look like at
rate limiting issue or an issue of journal not able to handle the
traffic to logging?

Error we are seeing from services that have silently failed, in this
case glassfish..

systemctl status -l glassfish
● glassfish.service - SYSV: GlassFish start and stop daemon
Loaded: loaded (/etc/rc.d/init.d/glassfish; bad; vendor preset: disabled)
Active: active (exited) since Tue 2017-09-26 20:01:36 UTC; 5 days ago
Docs: man:systemd-sysv-generator(8)
Process: 1328 ExecStart=/etc/rc.d/init.d/glassfish start (code=exited,
status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.

Eventually glassfish will fail but it wont kill the service so we never
get an nms service down trap from the OID.

Please let me know if further info is needed or if certain limits need
to be adjusted.

Thanks,
Brad Zynda
Post by Lennart Poettering
Post by Brad Zynda
also can you provide proper direction or a howto for breaking the link
between auditd and journald?
auditd and journald aren't linked. Both get the audit data directly
from the kernel. Either can run fine without the other. To turn off
that journald picks up audit data from the kernel use "systemctl mask
systemd-journald-audit.socket" (and reboot), which will turn off all
audit data collection by journald.
But again, I am not entirely sure what the original issue is?
Lennart
Lennart Poettering
2017-12-01 17:31:32 UTC
Permalink
Post by Lennart Poettering
Post by Brad Zynda
Hey Lennart,
Heya,
Post by Brad Zynda
Just wanted to get your thoughts on this before we break the link..
On what precisely? I am not sure what the original issue is, can you
summarize this briefly here?
Ah, completely forgot about that thread...

I am still wondering: why does auditd log to the journal (or syslog or
stdout/stderr) at all? It maintains its own log record database to my
knowledge. Moreover, journald is pulls audit data directly from the
kernel anyway, hence there's no need to also push data in from auditd
anyway....

Lennart
--
Lennart Poettering, Red Hat
Brad Zynda
2017-12-01 17:39:23 UTC
Permalink
Post by Lennart Poettering
Post by Lennart Poettering
Post by Brad Zynda
Hey Lennart,
Heya,
Post by Brad Zynda
Just wanted to get your thoughts on this before we break the link..
On what precisely? I am not sure what the original issue is, can you
summarize this briefly here?
Ah, completely forgot about that thread...
I am still wondering: why does auditd log to the journal (or syslog or
stdout/stderr) at all? It maintains its own log record database to my
knowledge. Moreover, journald is pulls audit data directly from the
kernel anyway, hence there's no need to also push data in from auditd
anyway....
Lennart
Hey Steve,
Just wanted to follow up on this and say we are still seeing services
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable
basically created a script to check all unit file services/targets and
grep status -l for Journal, it is effecting a large range of
service.target's, service.service's and service.socket's
If we restart the service or reboot we no longer see those messages
about journal and everything appears to run as expected.
I have never looked at journald code and have no idea how it works or
why it
cares about audit events. My advice last email was to break the link if its
causing problems.

-Steve
Post by Lennart Poettering
Post by Lennart Poettering
Post by Brad Zynda
grep perm_mod /etc/audit/audit.rules
-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -F
auid>=1000
Post by Lennart Poettering
Post by Lennart Poettering
Post by Brad Zynda
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chmod -S fchmod -S fchmodat -F
auid>=1000
Post by Lennart Poettering
Post by Lennart Poettering
Post by Brad Zynda
-F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S chown -S fchown -S fchownat -S lchown -F
auid>=1000 -F auid!=4294967295 -k perm_mod
-a always,exit -F arch=b64 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
-a always,exit -F arch=b32 -S setxattr -S lsetxattr -S fsetxattr -S
removexattr -S lremovexattr -S fremovexattr -F auid>=1000 -F
auid!=4294967295 -k perm_mod
grep delete /etc/audit/audit.rules
-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat
-F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b64 -S init_module -S delete_module -k modules
-a always,exit -F arch=b32 -S init_module -S delete_module -k modules
These rules are well formed. Meaning no obvious holes that would cause
unintended events. The other ausearch/aureport commands I gave you
should
show you what is causing the events and to which files. This
information
Post by Lennart Poettering
Post by Lennart Poettering
Post by Brad Zynda
may be used to create some kind of "never" rule that limits what gets
audited. If you do create some exclusion rule, it needs to be above the
perm_mod rules because audit is a "first match wins" system.
-Steve
-Steve
Here is a peak report (at least in the last 24 hours) didnt include the
Well, the idea is to look for something that's getting hit a lot. What it
sounds like is that things are getting deleted and modified quite a bit
all
over the place. Does the executable report show a pattern such as one
application doing a lot? For example, with the rule you have, doing a yum
update will delete a whole lot of stuff and modify a whole lot of stuff.
Its possible to put exceptions in the rules so that one program does not
flood the logs. But looking at the quantities below, the audit system can
easily handle that.
Its also possible to exclude directories from auditing if the pattern is
that you have a daemon receiving and modifying files and then deleting
them. You should look at the executables & files and see if you can do
with auditing what they are doing because its not interesting.
If this is causing you problems on the journald side where its causing
other tasks to fail. Then I'd break the link between auditd and journald.
The amount of event you show is highish but well within range of what
auditd can do. Just make sure flush is set to incremental_async and freq
is 100 or 200. You should be OK.
-Steve
Post by Lennart Poettering
Key Summary Report
===========================
total key
===========================
8170 perm_mod
5390 delete
1073 access
56 time-change
26 session
12 privileged
7 logins
Syscall Summary Report
==========================
total syscall
==========================
4250 fchmodat
1613 chmod
831 fchmod
521 fchown
97 chown
12 setxattr
Syscall Summary Report
==========================
total syscall
==========================
2887 unlink
2189 rename
186 unlinkat
so from the list the top 2 in each perm_mod and delete from the above
list seem to be the culprits..
So it sounds as if know one knows how audit works?

Thanks,
Brad

Loading...