Error logging on VMS

Discussion:

(too old to reply)

Simon Clubley

2021-08-23 17:49:04 UTC

I needed to spend some time in the error logging system this weekend
and I now have some questions and comments.

1) On x86-64 VMS, can the default anal/error command once again be
mapped to whatever the default error log display program will be on
x86-64 VMS ? The current command sequence on Alpha is a real pain.

Looking at x86-64 error logs on a x86-64 system is going to be the
most popular requirement on x86-64 VMS. If you want to look at other
format error logs, then _that_ can be selected via a qualifier.

The reason for the current messy command sequence on Alpha is because
the current error logger display program replaced the existing error
log display program and for some reason the new program was placed
on a qualifier instead of making it the default program from now on.

There's no reason for that on x86-64 VMS as there isn't a previous
version of the display program to replace on x86-64 VMS.

2) Errors logged against transient devices (such as network protocol
devices) appear to disappear when the device is deleted.

Should VMS be altered (if possible) to copy those error counters
somewhere permanent so they always show up in "show error" until
the counters are manually reset by the system manager ?

If the device gets deleted before you have a chance to run "show error",
you don't see that an error was logged in the first place unless
you routinely scan through error logs just in case _and_ unless
something was logged in the error logger.

3) It's too late for this now, but it would have been nice if the
only way to increase an error counter on VMS was via an API that
_always_ logged something in the error log so that you could see
there had been an error logged, even if something later happened
such as a system crash.

Simon.

--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Simon Clubley

2021-08-23 18:18:42 UTC

Permalink

Post by Simon Clubley
3) It's too late for this now, but it would have been nice if the
only way to increase an error counter on VMS was via an API that
_always_ logged something in the error log so that you could see
there had been an error logged, even if something later happened
such as a system crash.

I've just read this again and given my history I should point out
that I have _not_ been crashing systems this weekend. :-)

That was just an example of how you can lose an error indicator
unless there's a permanent record of it.

Simon.

--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.

Ian Miller

2021-08-24 15:49:58 UTC

Permalink

Post by Simon Clubley

I've just read this again and given my history I should point out
that I have _not_ been crashing systems this weekend. :-)
That was just an example of how you can lose an error indicator
unless there's a permanent record of it.
Simon.
--
Walking destinations on a map are further away than they appear.

device drivers can choose to send error reports to the error log as well as incrementing the error count for the device or not. Incrementing the error count is one instruction to increment a location in the device UCB so I guess finding all those and adding a call to $SNDERR [or the equivalent executive subroutine] is possible but unlikely. This chance could be made to specific drivers by VSI if you could persuade them.

Stephen Hoffman

2021-08-24 22:24:42 UTC

Permalink

Post by Simon Clubley
I needed to spend some time in the error logging system this weekend
and I now have some questions and comments.
...
Should VMS be altered (if possible) to copy those error counters
somewhere permanent so they always show up in "show error" until the
counters are manually reset by the system manager ?

The drivers request error logging, and there's a bit that can be set to
copy a buffer into the error logs.

Many of the existing device drivers don't implement or don't request
much in the way of error logging. (Doc here is murky at best too,
having just looked at it again.)

Among those drivers that do log, formatting the logged errors has been
an ongoing problem as differing formats and firmware and contents and
the rest of the morass all stop by for a visit.

For tooling, early on was SYE, and then a couple of different add-on
error-related products became available and later faded, so OpenVMS
added the ELV tool to translate errors from core parts.

DIAGNOSE and WEBES / SEA were two of the tools here, SNMP and MIBs
touch on this area, and there were and are other tools in this same
area.

HP and HPE have gone toward RedFish/DMTF more recently, which touches
on these same error-logging and hardware configuration areas.

This, startup and shutdown, IP integration (SNMP, IPMI, DMTF, etc), and
other server management-related areas—not the least of which are
operator comms—all need some work.

As has been discussed in various threads.

But... After the port.

Once VSI figures out what they're going to work on next, post-port.

If not the Arm port. 🤷🏼

--
Pure Personal Opinion | HoffmanLabs LLC