Problem
Your
database is hung. Users aren’t able to log in, and existing users can’t
complete their transactions.The DBAs with SYSDBA privileges may also be unable
to log in to the database. You need to find out what is causing the database to
hang, and fix the problem.
Solution
Follow
these general steps when facing a database that appears to be hung:
1.
Check your alert log to see if the
database has reported any errors, which mayindicate why the database is
hanging.
2.
See if you can get an AWR or ASH report or
query some of the ASH views, as explained in Chapter 5. You may notice events
such as hard parses at the top of the Load Profile section of the AWR report,
indicating that this is what is slowing down the database.
3.
A single ad hoc query certainly has the
potential to bring an entire database to its knees. See if you can identify one
or more very poorly performing SQL statements that may be leading to the hung
(or a very poorly performing)
database.
4.
Check the database for blocking locks as
well as latch contention.
5.
Check the server’s memory usage as well as
CPU usage. Make sure the sessions aren’t stalling because you’ve sized the PGA
too low.
6.
Don’t overlook the fact that a
scary-looking database hang may be caused by something as simple as the filling
up of all archive log destinations. If the archive destination is full, the
database will hang, and new user connections
will
fail. You can, however, still connect as the SYS user, and once you make room
in the archive destination by moving some of the archived redo log files, the
database becomes accessible to the users.
7.
Check the Flash Recovery Area (FRA). A
database also hangs when it’s unable to write Flashback Database logs to the
recovery area. When the FRA fills up, the database won’t process new work and
it won’t spawn new database
connections.
You can fix this problem by making the recovery area larger with the alter system set db_recovery_file_dest_size command.
If
you’re still unable to resolve the reasons for the hung database, you most
likely have a truly hung database. While you’re investigating such a database,
you may sometimes find yourself unable to connect and log in. In that case, use
the “prelim” option to log in to the database. The prelim option doesn’t
require a real database connection. Here's an example that shows how to use the
prelim option to log into a database:
C:\app\ora\product\11.2.0\dbhome_1\bin>sqlplus
/nolog
SQL*Plus: Release 11.2.0.1.0 Production on Sun Mar
27 10:43:31 2011
Copyright (c) 1982, 2010, Oracle. All rights
reserved.
SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
Alternatively,
you can use the command sqlplus
-prelim "/ as sysdba" to log
in with the –prelim option. Note that you use the nolog
option to open a SQL*Plus session. You
can’t execute the set _prelim on command if you’re already connected to the database. Once you
establish a prelim connection as shown here, you can execute the oradebug hanganalyze command to analyze a hung database—for example:
SQL> oradebug hanganalyze 3
Statement processed.
SQL>
In
an Oracle RAC environment, specify the oradebug hanganalyze command
with additional options,
as
shown here:
SQL> oradebug setinst all
SQL> oradebug -g def hanganalyze 3
You
can repeat the oradebug hanganalyze command a couple of times to generate dump files for varying
process states. In addition to the dump files generated by the hanganalyze command,
Oracle Support may often also request a process state dump, also called a systemstate dump,
to analyze hung database conditions. The systemstate
dump will report on what the processes are
doing and the resources they’re currently holding. You can get a systemstate dump
from a non-RAC system by executing the following set of commands.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
SQL>
Issue
the following commands to get a systemstate
dump in a RAC environment:
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug -g all dump systemstate 266
Note
that unlike the oradebug hanganalyze command, you must connect to a process. The setmypid option
specifies the process, in this case your own process. You can also specify a
process ID other than yours, in which case you issue the command oradebug setmypid before issuing the dump
systemstate command. If you try to issue the dump systemstate command
without setting the PID, you’ll receive an error:
SQL> oradebug dump systemstate 10
ORA-00074: no process has been specified
You
must take the systemstate dumps a few times, with an interval of about a minute or so in between
the dumps. Oracle Support usually requests
How
It Works
A true database hang can be due to a variety of reasons, including
a system that has exhausted resources such as the CPU or memory, or because
several sessions are stuck waiting for a resource such as a lock. While the database can
automatically resolve deadlocks between sessions (by killing one of the sessions
holding a needed lock), when there’s a latch or pin on an internal kernel-level
resource, Oracle is sometimes unable to automatically detect and resolve the
internal deadlock—and this leads to what Oracle Support calls a “true database
hang.” A true database hang is thus an internal deadlock or a cyclical
dependency among multiple processes. Oracle Support will usually ask you to
provide them the hanganalyze trace files and multiple systemstate dumps to
enable them to diagnose the root cause of your hang. At times like this, you
may not even be able to log into the database. Your first instinct when you
realize that you can’t even log in to a database is to try shutting down and
restarting, often referred to as bouncing the database. Unfortunately, while
shutting down and restarting the database may
“resolve” the issue, it’ll also disconnect all users—and you’re no
wiser as to what exactly caused the problem. If you do decide to bounce your
database, quickly generate a few hanganalyze and systemstate dumps first.
■ Tip As unpleasant as it may be at times, if
you find that you simply can’t connect to a hung database, then collect any
trace dumps you may need, and quickly bounce the database so that users can
access their applications. Especially when you’re dealing with a database
that’s hanging because of memory issues, bouncing the instance may get things
going again quickly.
If you find that the database is completely unresponsive, and you
can’t even log in to the database with the SYSDBA privilege, you can use the
prelim option to log into the database. The prelim option stands for
preliminary connection, and it starts an Oracle process and attaches that
process to the SGA shared memory. However, this is not a full or complete
connection, but a limited connection where the structures for query execution
are not set up—so, you cannot even query the V$ views. However, the prelim
option lets you run oradebug commands to get error dump stacks for diagnostic
purposes. The output of the hanganalyze command
can tell Oracle Support engineers if your database is really hanging, because
of sessions waiting for some resource. The command makes internal kernel calls
to find out all sessions that are waiting for a resource and shows the
relationship between the blocking and waiting sessions. The hanganalyze option that you can specify with either the oradebug command or an alter session statement
produces details about hung sessions. Once you get the dump file, Oracle
Support personnel can analyze it and let you know the reasons for the database
hang.
You can invoke the hanganalyze command
at various levels ranging from 1 to 10. Level 3 dumps processes that are in a
hanging (IN_HANG) state. You normally don’t need to specify a level
higher than 3, because higher levels will produce voluminous reports with too
many details about the processes
Note
that we issued the oradebug command to get a systemstate
dump with a level of 266. Level 266 (combination
of Level 256, which produces short stack information, and Level 10) is for
Oracle releases 9.2.0.6 and onward (earlier releases used systemstate level 10). Level 266 allows you to dump the short stacks for each
process, which are Oracle function calls that help Oracle development teams
determine which Oracle function is causing the problem. The short stack
information also helps in matching known bugs in the code. On Solaris and Linux
systems, you can safely specify level 266, but on other systems, it may take a
long time to dump the short stacks. Therefore, you may want to stick with level
10 for the other operating systems. If you can find out the blocking session,
you can also take a dump just for that session, by using the command oradebug setospid nnnn, where nnnn is the blocking session’s PID, and then invoking the oradebug command,
as shown here:
SQL> oradebug setospid 9999
SQL> oradebug unlimit
SQL> oradebug dump errorstack 3
Note
that you can generate the hanganalyze
and systemstate
dumps in a normal session (as well as in a
prelim session), without using the oradebug
command. You can invoke the hanganalyze command
with an alter session command, as shown here.
SQL>alter session set events 'immediate trace
name hanganalyze level 3';
Similarly,
you can get a systemstate
dump with the following command:
SQL> alter session set events 'immediate trace
name SYSTEMSTATE level 10';
Session altered.
SQL>
The
oradebug and systemstate dumps are just two of the many dumps you can collect. Use the
oradebug dumplist command
to view the various error dumps you can collect.
SQL> oradebug dumplist
TRACE_BUFFER_ON
TRACE_BUFFER_OFF
LATCHES
PROCESSSTATE
SYSTEMSTATE
INSTANTIATIONSTATE
REFRESH_OS_STATS
CROSSIC
CONTEXTAREA
HANGDIAG_HEADER
HEAPDUMP
…
Note
that while you can read some of the dump files in an editor, these files are
mainly for helping
Oracle
Support professionals troubleshoot a database hang situation. There’s not much
you can do with the dump files, especially when a database hang situation is
due to an Oracle bug or a kernel-level lock, except to send them along to
Oracle Support for analysis.