Discussion:
[Check_mk (english)] Error getting data
Brian Binder
2018-06-04 12:38:02 UTC
Permalink
Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian
Brian Binder
2018-06-04 17:17:53 UTC
Permalink
If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 7:38:20 AM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Error getting data

Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Brian Binder
2018-06-13 18:10:19 UTC
Permalink
Can anyone else try this for me?  I can't be the only one experiencing this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there.  On the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that matter.  Full scan for me always gives me UNKN for responses.
I don't get it...


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 12:18:12 PM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: Error getting data

If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 7:38:20 AM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Error getting data

Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Andreas Döhler
2018-06-14 20:36:34 UTC
Permalink
Hi Brian,

I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.

Best regards
Andreas
Can anyone else try this for me? I can't be the only one experiencing
this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there. On
the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that
matter. Full scan for me always gives me UNKN for responses.
I don't get it...
Date: June 4, 2018 at 12:18:12 PM
Subject: Re: Error getting data
If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.
Date: June 4, 2018 at 7:38:20 AM
Subject: Error getting data
Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it? When I do a Full Scan, I get "Error getting
data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single
piece of equipment that runs SNMP. Doesn't matter who the vendor is -
every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Brian Binder
2018-06-15 01:42:00 UTC
Permalink
Thanks for the tips, Andreas.
I'm going through it...first command gets me this:
WARN - Discovery failed: Failed to lookup IPv4 address of -vv via DNS: [Errno -3] Temporary failure in name resolution.
Weird though, as I can ping it and the host command works perfectly.  No issues with resolving whatsoever.

Second command results:
Doing discovery for all marked hosts:
  Nothing to do. No hosts marked by discovery check.

Third command runs perfectly and discovers everything in maybe 3 seconds.
Last line states nothing new, but no errors are thrown.

Just has an issue with the first command, but not sure why.  DNS is internal, dns-search is in use and everything resolves as expected.  I'll search and see what I can find.  Either way though, I still have the IPv4 address in WATO, like I always do.


From: Andreas Döhler <***@gmail.com>
Reply: Andreas Döhler <***@gmail.com>
Date: June 14, 2018 at 3:36:44 PM
To: Brian Binder <***@icloud.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

Hi Brian,

I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.

Best regards
Andreas

Brian Binder <***@icloud.com> schrieb am Mi., 13. Juni 2018 um 20:11 Uhr:
Can anyone else try this for me?  I can't be the only one experiencing this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there.  On the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that matter.  Full scan for me always gives me UNKN for responses.
I don't get it...


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 12:18:12 PM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: Error getting data

If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 7:38:20 AM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Error getting data

Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Brian Binder
2018-06-15 02:34:02 UTC
Permalink
This is what happens: https://cl.ly/153v1s1A0v1j
After the first check of doing the save host and go to services, half the time it will show all green and show ok.  Then if I do a full scan all of them go to unknown.
The checks work overall though, which is strange as well.
If I do the full scan, this does represent what is active, even though all the data displayed is missing.  Once I save the host and it polls, the data is collected like any other working host.  But if I do a full scan later on, it will always show unknown on every single thing listed.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 14, 2018 at 8:42:19 PM
To: Andreas Döhler <***@gmail.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

Thanks for the tips, Andreas.
I'm going through it...first command gets me this:
WARN - Discovery failed: Failed to lookup IPv4 address of -vv via DNS: [Errno -3] Temporary failure in name resolution.
Weird though, as I can ping it and the host command works perfectly.  No issues with resolving whatsoever.

Second command results:
Doing discovery for all marked hosts:
  Nothing to do. No hosts marked by discovery check.

Third command runs perfectly and discovers everything in maybe 3 seconds.
Last line states nothing new, but no errors are thrown.

Just has an issue with the first command, but not sure why.  DNS is internal, dns-search is in use and everything resolves as expected.  I'll search and see what I can find.  Either way though, I still have the IPv4 address in WATO, like I always do.


From: Andreas Döhler <***@gmail.com>
Reply: Andreas Döhler <***@gmail.com>
Date: June 14, 2018 at 3:36:44 PM
To: Brian Binder <***@icloud.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

Hi Brian,

I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.

Best regards
Andreas

Brian Binder <***@icloud.com> schrieb am Mi., 13. Juni 2018 um 20:11 Uhr:
Can anyone else try this for me?  I can't be the only one experiencing this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there.  On the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that matter.  Full scan for me always gives me UNKN for responses.
I don't get it...


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 12:18:12 PM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: Error getting data

If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 7:38:20 AM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Error getting data

Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Brian Binder
2018-06-15 03:31:19 UTC
Permalink
I have to test this a bit more, but I think I got it.
I went over a few sites that I performed the upgrades for and found 1 that did not have the issue - the difference was that there was no modified rule for Check intervals for SNMP checks.
By default, I usually set these for a higher-than-default level; say - 15 minutes because in some of the stacked switch configurations, it would take a bit longer to get the interface information and sometimes I'd see high CPU spikes on the switches in a continuous fashion.
So, 15 minutes was fine for me in many instances.
I had this set for All SNMP Checks, 15 minutes.
I disabled that rule and immediately, everything works as expected, and nothing goes into an unknown state - it's green across the board.
Re-enable that rule and it all goes unknown again.  But again, this only happens in the 1.4 versions of Check_MK for me.  I do not see this issue in 1.2.8pxx versions.
I do appreciate your help.  I wonder if you were to set the same 15 minute interval for all SNMP checks for a single SNMP-enabled host, like a switch, could you reproduce my issue?  I can reproduce on 5 upgraded sites if that modified rule is present.
Is this just expected behavior of version 1.4?  Guess I'm not used to seeing something like that - it looks like "something is broken" instead of it honoring a 15 minute check delay.
Brian


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 14, 2018 at 9:34:21 PM
To: Andreas Döhler <***@gmail.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

This is what happens: https://cl.ly/153v1s1A0v1j
After the first check of doing the save host and go to services, half the time it will show all green and show ok.  Then if I do a full scan all of them go to unknown.
The checks work overall though, which is strange as well.
If I do the full scan, this does represent what is active, even though all the data displayed is missing.  Once I save the host and it polls, the data is collected like any other working host.  But if I do a full scan later on, it will always show unknown on every single thing listed.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 14, 2018 at 8:42:19 PM
To: Andreas Döhler <***@gmail.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

Thanks for the tips, Andreas.
I'm going through it...first command gets me this:
WARN - Discovery failed: Failed to lookup IPv4 address of -vv via DNS: [Errno -3] Temporary failure in name resolution.
Weird though, as I can ping it and the host command works perfectly.  No issues with resolving whatsoever.

Second command results:
Doing discovery for all marked hosts:
  Nothing to do. No hosts marked by discovery check.

Third command runs perfectly and discovers everything in maybe 3 seconds.
Last line states nothing new, but no errors are thrown.

Just has an issue with the first command, but not sure why.  DNS is internal, dns-search is in use and everything resolves as expected.  I'll search and see what I can find.  Either way though, I still have the IPv4 address in WATO, like I always do.


From: Andreas Döhler <***@gmail.com>
Reply: Andreas Döhler <***@gmail.com>
Date: June 14, 2018 at 3:36:44 PM
To: Brian Binder <***@icloud.com>
Cc: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: [Check_mk (english)] Error getting data

Hi Brian,

I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.

Best regards
Andreas

Brian Binder <***@icloud.com> schrieb am Mi., 13. Juni 2018 um 20:11 Uhr:
Can anyone else try this for me?  I can't be the only one experiencing this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there.  On the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that matter.  Full scan for me always gives me UNKN for responses.
I don't get it...


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 12:18:12 PM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Re: Error getting data

If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.


From: Brian Binder <***@icloud.com>
Reply: Brian Binder <***@icloud.com>
Date: June 4, 2018 at 7:38:20 AM
To: Check_MK List <checkmk-***@lists.mathias-kettner.de>
Subject:  Error getting data

Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian

_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
checkmk-***@lists.mathias-kettner.de
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Andreas Döhler
2018-06-19 19:40:39 UTC
Permalink
Hi Brian,

I watched your small screen cast and think i know whats happen there. If
you have the SNMP interval set then the scan thinks "oh i should not do
this as it is not 15 minutes since last check".
If i have some time the next days i can take a look at the current 1.5
version if this is the same behavior there.

On my systems i don't have rules for the SNMP checks alone. If i want to
check the switch only every 15 min, i set the "Normal check interval for
service checks" and "Retry check interval for service checks" to the wanted
interval.
With this method i had no problems all the years.

Best regards
Andreas
Post by Brian Binder
I have to test this a bit more, but I think I got it.
I went over a few sites that I performed the upgrades for and found 1 that
did not have the issue - the difference was that there was no modified rule
for Check intervals for SNMP checks.
By default, I usually set these for a higher-than-default level; say - 15
minutes because in some of the stacked switch configurations, it would take
a bit longer to get the interface information and sometimes I'd see high
CPU spikes on the switches in a continuous fashion.
So, 15 minutes was fine for me in many instances.
I had this set for All SNMP Checks, 15 minutes.
I disabled that rule and immediately, everything works as expected, and
nothing goes into an unknown state - it's green across the board.
Re-enable that rule and it all goes unknown again. But again, this only
happens in the 1.4 versions of Check_MK for me. I do not see this issue in
1.2.8pxx versions.
I do appreciate your help. I wonder if you were to set the same 15 minute
interval for all SNMP checks for a single SNMP-enabled host, like a switch,
could you reproduce my issue? I can reproduce on 5 upgraded sites if that
modified rule is present.
Is this just expected behavior of version 1.4? Guess I'm not used to
seeing something like that - it looks like "something is broken" instead of
it honoring a 15 minute check delay.
Brian
Date: June 14, 2018 at 9:34:21 PM
Subject: Re: [Check_mk (english)] Error getting data
This is what happens: https://cl.ly/153v1s1A0v1j
After the first check of doing the save host and go to services, half the
time it will show all green and show ok. Then if I do a full scan all of
them go to unknown.
The checks work overall though, which is strange as well.
If I do the full scan, this does represent what is active, even though all
the data displayed is missing. Once I save the host and it polls, the data
is collected like any other working host. But if I do a full scan later
on, it will always show unknown on every single thing listed.
Date: June 14, 2018 at 8:42:19 PM
Subject: Re: [Check_mk (english)] Error getting data
Thanks for the tips, Andreas.
[Errno -3] Temporary failure in name resolution.
Weird though, as I can ping it and the host command works perfectly. No
issues with resolving whatsoever.
Nothing to do. No hosts marked by discovery check.
Third command runs perfectly and discovers everything in maybe 3 seconds.
Last line states nothing new, but no errors are thrown.
Just has an issue with the first command, but not sure why. DNS is
internal, dns-search is in use and everything resolves as expected. I'll
search and see what I can find. Either way though, I still have the IPv4
address in WATO, like I always do.
Date: June 14, 2018 at 3:36:44 PM
Subject: Re: [Check_mk (english)] Error getting data
Hi Brian,
I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.
Best regards
Andreas
Can anyone else try this for me? I can't be the only one experiencing
this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there. On
the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that
matter. Full scan for me always gives me UNKN for responses.
I don't get it...
Date: June 4, 2018 at 12:18:12 PM
Subject: Re: Error getting data
If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values
in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.
Date: June 4, 2018 at 7:38:20 AM
Subject: Error getting data
Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it? When I do a Full Scan, I get "Error
getting data" for every single piece of information that exists, or should
exist.
This happens every single time and I can reproduce it on every single
piece of equipment that runs SNMP. Doesn't matter who the vendor is -
every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Brian Binder
2018-06-21 12:17:40 UTC
Permalink
Thanks, Andreas!
I just wasn't used to seeing it in 1.4, since 1.2 didn't behave the same way.  Makes it look like something is wrong when I see the UNKN all over the place :(
I like your advice though, on changing the service checks themselves, which sounds like a good plan to adopt, should I need to up the intervals.  Thanks again for all your help!
Brian
Post by Andreas Döhler
Hi Brian,
I watched your small screen cast and think i know whats happen there. If you have the SNMP interval set then the scan thinks "oh i should not do this as it is not 15 minutes since last check".
If i have some time the next days i can take a look at the current 1.5 version if this is the same behavior there.
On my systems i don't have rules for the SNMP checks alone. If i want to check the switch only every 15 min, i set the "Normal check interval for service checks" and "Retry check interval for service checks" to the wanted interval.
With this method i had no problems all the years.
Best regards
Andreas
Post by Brian Binder
I have to test this a bit more, but I think I got it.
I went over a few sites that I performed the upgrades for and found 1 that did not have the issue - the difference was that there was no modified rule for Check intervals for SNMP checks.
By default, I usually set these for a higher-than-default level; say - 15 minutes because in some of the stacked switch configurations, it would take a bit longer to get the interface information and sometimes I'd see high CPU spikes on the switches in a continuous fashion.
So, 15 minutes was fine for me in many instances.
I had this set for All SNMP Checks, 15 minutes.
I disabled that rule and immediately, everything works as expected, and nothing goes into an unknown state - it's green across the board.
Re-enable that rule and it all goes unknown again.  But again, this only happens in the 1.4 versions of Check_MK for me.  I do not see this issue in 1.2.8pxx versions.
I do appreciate your help.  I wonder if you were to set the same 15 minute interval for all SNMP checks for a single SNMP-enabled host, like a switch, could you reproduce my issue?  I can reproduce on 5 upgraded sites if that modified rule is present.
Is this just expected behavior of version 1.4?  Guess I'm not used to seeing something like that - it looks like "something is broken" instead of it honoring a 15 minute check delay.
Brian
Date: June 14, 2018 at 9:34:21 PM
Subject:  Re: [Check_mk (english)] Error getting data
Post by Brian Binder
This is what happens: https://cl.ly/153v1s1A0v1j
After the first check of doing the save host and go to services, half the time it will show all green and show ok.  Then if I do a full scan all of them go to unknown.
The checks work overall though, which is strange as well.
If I do the full scan, this does represent what is active, even though all the data displayed is missing.  Once I save the host and it polls, the data is collected like any other working host.  But if I do a full scan later on, it will always show unknown on every single thing listed.
Date: June 14, 2018 at 8:42:19 PM
Subject:  Re: [Check_mk (english)] Error getting data
Post by Brian Binder
Thanks for the tips, Andreas.
WARN - Discovery failed: Failed to lookup IPv4 address of -vv via DNS: [Errno -3] Temporary failure in name resolution.
Weird though, as I can ping it and the host command works perfectly.  No issues with resolving whatsoever.
  Nothing to do. No hosts marked by discovery check.
Third command runs perfectly and discovers everything in maybe 3 seconds.
Last line states nothing new, but no errors are thrown.
Just has an issue with the first command, but not sure why.  DNS is internal, dns-search is in use and everything resolves as expected.
 I'll search and see what I can find.  Either way though,
I still have the IPv4 address in WATO, like I always do.
Date: June 14, 2018 at 3:36:44 PM
Subject:  Re: [Check_mk (english)] Error getting data
Post by Andreas Döhler
Hi Brian,
I can only say, that this exact problem don't happen on any of my systems
On the command line can you try the following command combination?
cmk --debug --check-discovery -vv HOSTNAME
cmk --debug --discover-marked-hosts -vv
and if this runs well you can also try
cmk --debug -vvI HOSTNAME
The last one is like WATO does it if you try to find new checks.
Best regards
Andreas
Post by Brian Binder
Can anyone else try this for me?  I can't be the only one experiencing this...
I have 1.4.0p33 and p34 builds and the exact same behavior is there.  On the 1.2p2x builds, this doesn't happen.
Just query your switches via SNMP, or anything using SNMP for that matter.  Full scan for me always gives me UNKN for responses.
I don't get it...
Date: June 4, 2018 at 12:18:12 PM
Subject:  Re: Error getting data
Post by Brian Binder
If I perform a cmk -D HOSTNAME on the CLI, it returns all proper values in like 3 seconds.
Using WATO, it just lists UNKN for every single reported value, all the time.
Date: June 4, 2018 at 7:38:20 AM
Subject:  Error getting data
Post by Brian Binder
Ever since moving from 1.2.8 to 1.4.0p33, I have issues with SNMP discovery.
Have you guys experienced it?  When I do a Full Scan, I get "Error getting data" for every single piece of information that exists, or should exist.
This happens every single time and I can reproduce it on every single piece of equipment that runs SNMP.  Doesn't matter who the vendor is - every single device that runs SNMP is affected.
How would you guys go about troubleshooting this?
Thanks!
Brian
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
_______________________________________________
checkmk-en mailing list
Manage your subscription or unsubscribe
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Loading...