Thoughts of Oracle !: March 2015

Problem :

Sun Mar 29 17:47:42 2015
ARC0: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
ARC0: RFS network connection re-established at host 'pbldg'
ARC0: Archive log rejected (thread 1 sequence 4898) at host 'pbldg'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance pbldb1 - Archival Error. Archiver continuing.
Sun Mar 29 18:00:00 2015

############################################################################

It is definitely not an archiving file system full. If it was the case then you would see archiver stuck messages in the alert log.

This is due to a bug and Oracle suggests to ignore these messages appearing in the Alert log as long as the Primary and Standby are in sync.

RC3: Archive log rejected (thread 1 sequence 136480) at host '(DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.26)(PORT=1530))(CONNECT_DATA=(SERVICE_NAME=sdbinfo)(SERVER=DEDICATED)))'

Errors in file /opt/dbinfo/diag/rdbms/dbinfo/dbinfo/trace/dbinfo_arc3_7445.trc:

ORA-16401: archivelog rejected by RFS

FAL[server, ARC3]: FAL archive failed, see trace file.

Errors in file /opt/dbinfo/diag/rdbms/dbinfo/dbinfo/trace/dbinfo_arc3_7445.trc:

ORA-16055: FAL request rejected

ARCH: FAL archive failed. Archiver continuing

ORACLE Instance dbinfo - Archival Error. Archiver continuing.

- RedoLog switch happens very frequently before the above Error reported

- The requested ArchiveLog by FAL requests the current Log Sequence or the Sequence currently being archived

Changes

The Problem here is that the Primary Database is switching Logs too frequently.

Using ARCH to send the archives, every time there's a log switch the Primary has to send the Archivelog to the Standby, meanwhile another Log Switch occurred on the Primary which causes also another Archivelog to be sent to the Standby, but the first one has not finished yet, a GAP is formed and detected by the Standby. At this Time the first Archivelog is also sent as FAL Request, but this one will fail because the first one is still being archiving, locked, so the second one fails.

Solution

Ignore these Messages as long as the Standby Database keeps synchronized with the Primary

Database Increase the Size of the Online Redologs to reduce Redolog Switch Frequency

Increase Network Bandwith between the Primary and Standby Database

Source : http://scn.sap.com/thread/3351294

Alert Log Errors: 12170 TNS-12535/TNS-00505: Operation Timed Out (Doc ID 1628949.1)

In this Document

APPLIES TO:

Oracle Net Services - Version 11.2.0.3 to 12.1.0.1 [Release 11.2 to 12.1]
Information in this document applies to any platform.

SYMPTOMS

The following error is reported in the database alert log.

***Note the "Client address" is posted within the error stack in this case.

Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
***Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679))

The PORT field here is the ephemeral port assigned to the client for this connection.
This does not correspond to the listener port.

CHANGES

No changes are necessary, but may have recently upgraded the database to 11g release 1 or higher, or installed a new Oracle11g database and they are now visible in the alert log.
Note: Prior to 11gR1 these same 'Fatal NI connect error 12170' are written to the sqlnet.log. This document describes a problem that arises when a firewall exists between the client and the database server.

CAUSE

We can search the listener log covering the same time period using this search criteria.

(HOST=121.23.142.141)(PORT=45679)

The 11g listener log in text format is located here:

$ORACLE_BASE/diag/tnslsnr///trace/.log

Again, this is the client's IP address and the unique ephemeral port assigned to the client for this connection.

In this case, we find that this connection was established at the listener at this timestamp:

22-FEB-2014 10:42:10 * (CONNECT_DATA=(SID=test)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679)) * establish * test* 0 .

Compare this to the event in the alert.log with special attention to the timestamp.
The connection was dropped by the instance at 22-FEB-2014 12:45:09 or roughly 2 hours later.

Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
***Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=121.23.142.141)(PORT=45679))

The 'nt secondary err code' identifies the underlying network transport, such as (TCP/IP) timeout limit. In the current case 60 identifies Windows underlying transport layer.

The "nt secondary err code" will be different based on the operating system:

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX : "nt secondary err code: 238"
AIX: "nt secondary err code: 78"
Solaris: "nt secondary err code: 145"

The alert.log message indicates that a connection was terminated AFTER it was established to the instance. In this case, it was terminated 2 hours and 3 minutes after the listener handed the connection to the database.

This would indicate an issue with a firewall where a maximum idle time setting is in place.

The connection would not necessarily be "idle". This issue can arise during a long running query or when using JDBC Thin connection pooling. If there is no data 'on the wire' for lengthy

periods of time for any reason, the firewall might terminate the connection.

SOLUTION

The non-Oracle solution would be to remove or increase the firewall setting for maximum idle time. In cases where this is not feasible, Oracle offers the following suggestion:

The following parameter, set at the **RDBMS_HOME/network/admin/sqlnet.ora, can resolve this kind of problem. DCD or SQLNET.EXPIRE_TIME can mimic data transmission between the server and the client during long periods of idle time.

SQLNET.EXPIRE_TIME=n Where is a non-zero value set in minutes.

See the following : Note 257650.1 Resolving Problems with Connection Idle Timeout With Firewall

**In an installation that includes GRID, this parameter should be set in the RDBMS_HOME/network/admin/sqlnet.ora file. This would be the default location for sqlnet.ora file parameters referenced by the instance.

Please consider your business requirement for allowing connections to remain or appear 'idle' before implementing these suggestions.

REFERENCES

NOTE:257650.1 - Resolving Problems with Connection Idle Timeout With Firewall
NOTE:1286376.1 - Fatal NI Connect Error 12170, 'TNS-12535: TNS:operation timed out' Reported in 11g Alert Log

Doc ID 1335630.1

Doc ID 1286376.1

configuration on ODA :

[oracle@primoda1 ~]$ cat /u01/app/oracle/product/11.2.0.3/dbhome_1/network/admin/sqlnet.ora

SQLNET.EXPIRE_TIME=10

SQLNET.INBOUND_CONNECT_TIMEOUT=2000

DIAG_ADR_ENABLED = OFF

DIAG_ADR_ENABLED_LISTENER = OFF

Thoughts of Oracle !

Sunday, March 29, 2015

Saturday, March 28, 2015

12170 TNS-12535/TNS-00505: Operation Timed Out error on alert log after Firewall Implementation

About Me

Blog Archive