Dump analysis and elimination, or DAE uses the SYS1.DAE dataset to track and store these symptoms of SVC dumps and systems. When a duplicate dump is encountered, DAE can skip subsequent dumps so you don't wind up with multiple copies of the same thing. Parmlib members at ADYSETxx are used to control the DAE processing. SYS1.LOGREC is the dataset used to store problem summary records for software and hardware errors, which occur on the system. After dataset initialization at z/OS installation, z/OS automatically records error records. Installation is responsible for archiving records. The SYSLOG is a timestamped log file which can be used to capture the hard copy log which tracks commands, responses, and messages handled by MVS console management. It also logs system job activity and starts recording automatically at IPL time. It can be written out to the Js output file through the right log command. The basic problem determination component is a dump. A dump is a picture of the failing program and the system at the time of the problem. It's like saying, hey, drop everything, we need to investigate what was happening at this specific point in time. There's user dumps and system dumps. User dumps are taken and directed through JCL. SYSUDUMP, and SYSABEND dumps are formatted and contain limited information, and SYSMDUMP requires IPCS for viewing, and it may be useful in certain situations. What's IPCS? You might ask. It's the interactive problem control system, I-P-C-S. It's a tool provided to aid in the diagnosis of system software failures. It allows for the formatting of dumps and traces as well as other applications which run on MVS. System dumps like SVC dumps and Stand-alone dumps require IPCS for viewing, and can be initiated from the MVS console, a recovery routine or a slip. That's a concept we'll talk about later. Stand-alone dumps are taken after a system outage and contain the most information. It makes sense the system is down, seems like a good time to take everything in our universe and write it out to disk. This is a nice chart. It's everything in one place. You get to see the bigger picture, and we have a few words on how to put it to good use. An Abend dump is used when a program ends because of an unrecoverable error. The dump shows the virtual storage for the program requesting the dump and the system data associated with the failing program. There are three types of Abend dumps; SYSABEND, SYSMDUMP, and SYSUDUMP. Each dumps different areas, and it's important to select the dump which gives the areas needed for diagnosing these specific problem. The SYSABEND dump is the largest of the three, and it contains a summary dump plus many other areas which will be useful in analysis. The SYSMDUMP contains the summary dump plus some system information for the failing task, and the SYSUDUMP is the smallest, it contains data and areas only about the failing program. Then there's SNAP dumps and Stand-alone dumps. SNAP dumps are useful when you're testing a problem program. It shows one or more areas of virtual storage. A series of SNAP dumps can show an area at different stages in order to get a snap shot, see how that works, of what's going on over time. These are pre-formatted, you don't need to use IPCS to format them. Stand-alone dumps are used when the system stops processing or when you're in a disabled wait state, or you're stuck in a loop that's making things slow or unresponsive, and they show main storage plus some paged out virtual storage occupied by the system or Stand-alone dump program that failed. SVC dumps can be used to investigate unexpected system errors when the system continues processing or when an authorized program or operator needs diagnostic data to solve a problem. They contain a summary dump, control blocks, and other system code. But the exact areas dumped depend on whether the dump was requested by a macro, a command, or a slip trap. Then there's component traces which are used when you need trace data to report a component problem to IBM support. It shows the processing within an MVS component. GFS trace is used to collect information about requests for virtual storage through GETMAIN, FREEMAIN and STORAGE macros. GTF trace is used to show system processing happening in the system over time. This can be written out to an external dataset as well as to a buffer. Master trace shows the messages to and from the master console. System trace is used to see system processing occurring in the system over time. It runs continuously and records many system events with minimal details about each one. AMB list, AMBLIST, is used when you need information about the content of load modules and program objects or when you have a problem related to the modules on your system. Common storage tracking is used to collect data about requests to obtain or free storage in CA, ECSA, SQA, or ESQA. DAE dump analysis and elimination is used to eliminate duplicate or unneeded dumps. IPCS, that's used to format and analyze dumps and traces. The LOGREC dataset is your starting point for problem determination. It's got hardware errors, selected software errors, and selected system conditions. SLIP traps, we mentioned these briefly before, but SLIP stands for service ability level indication processing. So yeah, SLIP trap. Let's just stick with SLIP trap, will save us all a whole lot of time. Those are used to catch problem data or error events as they happen, like setting a trap. When an event matches what's described in the SLIP trap, boom. It leaps into action and performs the problem determination you specified. Then there's S-P-Z-A-P, SPZAP. This is a service aid to dynamically update and maintain programs and datasets. For problem determination, you can use it to fix program errors, insert and incorrect instruction into a program to force an Abend, or make a SLIP trap trigger. You can use it to alter instructions in a load model to start component trace, or to replace data directly on a [inaudible] or data records which got damaged during an IO or program error. I feel I'm about to have a few failures myself, so let's take a quick break and then come back to learn about the recovery termination manager.