Core Dump Management on the Solaris OSBy Adam Zhang, Sun Microsystems, April 2. Updated June 2. 00. Abstract: Abnormal termination of a process will trigger a core dump file. A core dump file is very helpful to programmers or support engineers for determining the root cause of abnormal termination, because it provides invaluable information about the runtime status at crash time. This article provides information about core dumps, as well as features and analysis tools in the Solaris Operating System that can be used to manage core dumps. Note: The information provided in this article is mainly for the Solaris 1. OS. Contents: Types of Core Dumps: Process and System. A core dump is a file that records the contents of a process along with other useful information, such as the processor register's value. There are two types of core dumps: system core dumps and process core dumps. They differ in many aspects, such as the manner in which they are created and the method used to analyze them. Cause of Process Core Dumps. When an application process receives a specific signal and terminates, the system generates a core dump and stops the process. In most cases, the signal leading to the application crash is SIGSEGV or SIGBUS. SIGSEGV indicates that the application is accessing an invalid memory address. This situation often occurs in C/C++ programs if there are code errors in pointer manipulation. On the Solaris OS, you can use the libumem(3. LIB) library as the user- mode memory allocator instead of libc. The libumem library can help find memory leaks, buffer overflows, attempts to use freed data, and many other memory allocation errors. Also, its memory allocator is very fast and scalable with multithreaded applications. SIGBUS indicates that the application is accessing a memory address that does not conform to CPU memory alignment rules. This usually happens to a system with an Ultra. As the Unix codebase was enhanced, the panic() function was also enhanced to dump various forms of debugging information to the console. Causes. A panic may occur as. SPARC processor. Systems with x. CPUs can handle unaligned memory addresses, but there is a performance impact. The Sun Studio C/C++ compiler has the - xmemalign option, which can be used to adjust the behavior of the Ultra. I've got problem with Handbrake/ffmpeg. After ~5 minutes transcoding, the computer locks up. I'm fairly sure it's a kernel panic because caps-lock starts flashing. Very long, extensive tutorial on how to use the crash utility to analyze Linux kernel crash memory cores, including detailed analysis of crash reports, using cscope. Set up serial console (recommended) For some kernel problems, a kernel core dump is not triggered and the system does not respond to the keyboard anymore. A core dump is a file that records the contents of a process along with other useful information, such as the processor register's value. There are two types of core. SPARC CPU when there are unaligned memory addresses that can be determined at compile time. The - xmemalign option causes the compiler to generate additional load/store instructions for unaligned memory access. However, the - xmemalign option cannot handle unaligned memory access during runtime. If unaligned memory access happens during runtime, the developer needs to change the source code. Check Point Security Gateway on SecurePlatform / Gaia freezes, crashes, or reboots randomly, core dump files are not created. Thanks for this tip. My computer had a kernel panic for the first time a couple of weeks ago and I had no choice but to reboot it manually. There are other signals whose default disposition is to create a core dump, for example, SIGFPE, which indicates a floating point exception. The Signal(3. HEAD) man page provides more details. How to Manage a Process Core Dump. The Solaris OS attempts to create up to three core dump files for each abnormally terminated process. One of the core dump files, which is called the per- process core file, is located in the current directory. Another core dump file, which is called the global core file, is created in the system- wide location. If the process is running in a local zone, a third core file is created in the global zone's location. You can use the coreadm(1. M) command to manage the core dumps. All the settings are saved in the /etc/coreadm. Below is a typical scenario, which shows the current system configuration for core dumps. In the previous output: The global core dumps: disabled line indicates no global core dump will be generated. The per- process core dumps: enabled line indicates a per- process core dump will be generated for each abnormal process. The init core file pattern line indicates the contents will be gathered from the live process to the per- process core dump. You can also use the coreadm command to control the core dump file name. This command causes the per- process core file name to be appended with the program file name (%f) and the runtime process ID (%p). A core dump file will be generated in the current working directory of the process. By default, the global core dump is disabled. You need to use the coreadm command with the - e global option to enable it. The - g option causes the command to append the program name (%f) and the runtime process ID (%p) to the core file name. As indicated previously, coreadm can specify the parts of the process that will be saved to the core file. Previously, when you performed a post- mortem analysis, you needed to obtain all the specific versions of the dependent libraries and runtime modules, because the core dump file does not contain this text information. It is quite a headache for programmers to recreate the environment from the original machine. With the default configuration, the Solaris OS applies the "default" pattern to each process core dump, which means the process core dump contains stack, heap, text, shared memory (SHM), intimate shared memory (ISM), and dynamic intimate shared memory (DISM) information, plus other information. The text part of the process core dump also contains a partial symbol table (dynsm), which will help you get a readable stack trace directly from one core file without any other boring dependent libraries. If the dynsm is insufficient, you can use coreadm to include all symbol tables, as follows. G all - I all. This previous command makes both the global core file ( - G) and the per- process core file ( - I) contain all the parts of the process. Here's how to use coreadm to verify the changes. The coreadm command is used to edit the configuration file of the coreadm service, which is managed by the Service Management Facility (SMF) with this service identifier: svc: /system/coreadm: default. How to Create a Process Core Dump Manually. The Solaris OS provides the gcore(1) command in case you need to create a core dump manually for a live process for analysis purposes. The live process ID is appended automatically to the name of the generated core dump. In the previous example, the process of the current shell is dumped and its process ID is 2. Note: There are other constraints you need take into account while generating the core dump, for example, the write permissions on the destination directory, the existence of the destination directory, the file system mount option, and process resource limitation. For resource limitation information, refer to the man pages for setrlimit(2) and ulimit(1). Another useful tool called App. Crash is available. It automatically collects diagnostic and debugging information when any application crashes under the Solaris OS. This article does not address its usage. For more information on using App. Crash, refer to Greg Nakhimovsky's blog. How to Analyze a Process Core Dump File. There are lots of tools in the Solaris OS for analyzing core dump files: dbx(1), mdb(1), and pstack(1). The most convenient method is to use the pstack tool to determine the process stack. This tool helps show multithreaded programs as well. Dhpi. Eread. 6Fipv. I_I_ (b, 8. 04. 28. JVM_Read (b, 8. 04. Call. Static. Void. Method (8. 06. 85b. Cos. HSolaris. FEvent. Epark. 6M_v_ (8. 11. In general, if the program's symbol table is not stripped and its runtime stack trace is available, you can expect almost 5. Sun Studio software. Sun Studio software includes free, optimizing C, C++, and Fortran compilers that can be used on both the Solaris OS and Linux. Here is a typical scenario for analyzing the core file using dbx. For more details on dbx, please refer to the document called Sun Studio 1. Debugging a Program With dbx. SUNWspro/bin/dbx t. Server core. For information about new features see 'help changes'. To remove this message, put 'dbxenv suppress_startup_message 7. Reading t. Server. Reading ld. so. 1. Reading libpthread. Reading librt. so. Reading libsocket. Reading libnsl. so. Reading libc. so. Reading libthread. Reading lib. Crun. Reading libm. so. Reading libkstat. SEGV (no mapping at. Current function is txn. Atom. Match. Rqst. Msg- > in. Header. Ver, "0. 1" == 0)) {. SIGSEGV in strcmp(). Timer. Thread() LWP suspended in __pollsys(). Thread t@1 (0xffffffff. Currently active in strcmp. Atom. Match. Rqst(), line 1. Atom. Match. Rqst. Flow(), line 9. 6 in "t. Flow. c". [4] t. Server(rqst = 0x. Server. c". [5] _tsvcdsp(0x. From the previous example, you can use dbx to determine the abnormal thread, which is marked with "o," and its root cause by showing the source code. Of course, this will not happen unless you provide the application source code and add debug information during the compile phase. If you are familiar with assembly language and hardware specifications, you can use mdb to debug the core file, because mdb is a low- level debugging utility for both programs and the Solaris OS. Cause of System Core Dumps. There are lots of reasons why the Solaris OS might crash and produce a core dump. Not only software problems, such as like drivers and programs, but also hardware errors can induce a system core dump. How a System Core Dump Is Created. When detecting whether the integrity of data was corrupted or whether a fatal error in hardware occurred, the Solaris OS invokes panic(). The panic() routine interrupts all processes as if the OS is suspended. Then it generates a system core dump, which is a copy of OS in the memory, and saves it to the dump device. After a crash, the OS use savecore(1) to retrieve the core dump from the dump device to the savecore directory during the next boot. The savecore routine generates two files. One file is unix.< X> , which is an OS symbol table list, and the other is vmcore.< X> , which is the core dump data file. Reboot Linux box after a kernel panic. If you want the server to get rebooted automatically after kernel hit by a pain error message, try adding panic=N to /etc/sysctl. It specify kernel behavior on panic. By default, the kernel will not reboot after a panic, but this option will cause a kernel reboot after N seconds. For example following boot parameter will force to reboot Linux after 1. Open /etc/sysctl. When kernel panic’s, reboot after 1. Save and close the file. Alternatively, you may want to enable and use magic system request keys (Sys. Rq). Share this on.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
November 2016
Categories |