|
How do I enable host-based thresholds on a VMS system? |
|
|
ViewPoint for OpenVMS version 4.3-R10 and later (called VPTHRESHD) enables the you to set thresholds on the Host and have violations reported in one of 4 ways:
1. OpenVMS mail
2. SMTP (Internet) mail
3. SMNP traps that can be sent to a trap log on a UNIX-based host
4. Run an OpenVMS command file
For more information on how VPTHRESHD works, please refer to the following files:
VVP_HOME:VPTHRESHD.PS;
VVP_HOME:VPTHRESHD.TXT
It starts up similar to ViewPoint, with a command file that reads an INI file:
Here's what you do:
1. $COPY VVP_HOME:VPTHRESHD_STARTUP.COM_TEMPLATE VVP_HOME:VPTHRESHD_STARTUP.COM
2. Make the appropriate changes
3. $COPY VVP_ETC:VPTHRESHD.INI_TEMPLATE VVP_ETC:VPTHRESHD.INI
4. Put in the values you want (see the example below)
5. Start the threshold agent:
$@VVP_HOME:VPTHRESHD_STARTUP.COM
Remember to look at the node-specific information below (in this example, the disk is DKA300 and the nodename is TESSA). Notice how you can run command files EXECSHELL, send VMS mail, send SMTP mail to the Internet, and send SNMP traps to a trap log (items 4, 1, 2, and 3 above).
An example of the vpthreshd.ini file in the [VIEWPOINT.ETC] directory is shown below.
!
! Example configuration file for VPTHRESHD
!
! The first thing that needs to be specified is the host trace file name...
! The default file specification is VVP_DATA:.TF, so only the file name
! is needed if we are using the defaults
!
tracefile TESSA
!
! here we define some optional SNMP variables for our user-defined traps
! The 'TYPE' and 'DESCRIPTION' portions are optional
!
SNMPVAR VAR1 OID 1 VALUE $VARNAME() TYPE STRING DESCRIPTION "TEST SNMP VARIABLE 1"
SNMPVAR VAR2 OID 2 VALUE $SEVERITY() TYPE INTEGER DESCRIPTION "TEST SNMP VARIABLE 2"
!
! here we define a user-defined trap
! The 'DESCRIPTION' and 'MIBTEXT' sections are optional
!
SNMPTRAP TRAP1 SPECIFIC 1 VARIABLES var1,var2 DESCRIPTION "SNMPTRAP 1 DESCRIPTION" MIBTEXT "MIB TEXT"
!
! Now create a set of conditional thresholds that we monitor. We will try to
! hold the assumptions to a minimum, but will assume:
!
! 1) 60 second collector sampling intervals
!
! Set up a condition if the collector is stopped...if so, then send
! a mail message to SYSTEM and stop.
!
!Live Test: if $tracefiledead() then
! SENDMAILTO SYSTEM
! MESSAGE "Host collector has quit...exiting"
! ECHO "SENT TRACEFILE DEAD MESSAGE TO SYSTEM"
! EXITPROGRAM
! ENDIF
!
! Set up a condition that fires if the host collector becomes stuck for 5
! intervals...if so, then execute a command procedure
!
Stuck Test: if $tracefilestuck(5) then
EXECSHELL "jumpstart_collector"
echo "Trace file {$tracefilename()} stuck at {$time()}"
endif
!
! This test will fire if the operation count for any disk is greater than
! 50 operations / second for 5 consecutive intervals..if so, we send
! mail to SYSTEM
!
Opcnt test: if opcnt[*] > 50.0 for 5 minutes then
SENDMAILTO randell
MESSAGE "DISK OPCNT for {$VARNAME()} has been high for {$duration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
! This test will fire if the Workload for system is greater than 60% for any 5
! out of the last 10 intervals...if so, we send a trap message to nodemgr
!
WL test: if W/LCPU[SYSTEM] > 0 THEN
IF $persist(10) > 0.5 then
SENDTRAP TRAP1 TO 198.242.57.105
ENDIF
endif
!
! This test will fire if the free space on any disk drops below 100000 blocks...
! if so, we send the Datametrics pre-defined old-style trap to NODEMGR
!
Disk free space test: if freekblks[*] < 100000 then
SENDTRAPTO NODEMGR
MESSAGE "Free blocks for {$varname()} is below 100K"
SEVERITY 99
endif
!
!
MAXPROCESSCNT test: if MAXPROCESSCNT > 0.0 for 1 minutes then
execshell max.com
endif
!
!
SGAFIXED test: if SGAFIXED > 0.0 for 1 minutes then
SENDMAILTO smtp%"[email protected]"
MESSAGE "SGAFIXED for {$VARNAME()} has been high for {$duration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
LEF test: if LEF > 0.0 for 1 minutes then
SENDMAILTO system
MESSAGE "LEF for {$VARNAME()} has been high for {$duration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
HIB test: if HIB > 0.0 for 1 minutes then
SENDMAILTO system
MESSAGE "HIB for {$VARNAME()} has been high for {$duration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
SGAFREE test: if SGAFREE > 0.0 for 1 minutes then
SENDMAILTO smtp%"[email protected]"
MESSAGE "SGAFREE for {$VARNAME()} has been high for {$duration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
DISKCOUNT[TESSA$DKA300] test: if DISKCOUNT[TESSA$DKA300] > 0.0 for 1 minutes then
SENDMAILTO system
MESSAGE "DISKCOUNT[TESSA$DKA300] for {$VARNAME()} has been high for {$duratio
n()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
DISKTOTALKBLKS[TESSA$DKA300] test: if DISKTOTALKBLKS[TESSA$DKA300] > 0.0 for 1 minutes then
SENDMAILTO system
MESSAGE "DISKTOTALKBLKS[TESSA$DKA300] for {$VARNAME()} has been high for {$du
ration()} seconds"
echo "mail message sent to {$addresslist()} at {$time()}"
endif
!
|
|