Ahoy there! This is my personal blog which I use as my memory extension and a medium to share stuff that could be useful to others.

Middleware Archives

How to Install PHP with FreeTDS on Linux

There are PHP applications which use MSSQL as the back-end database and such applications require FreeTDS to enable PHP code interface with MSSQL. This article describes how to install PHP and FreeTDS on Linux hosts.

To compile or not?: Typically, it is recommended to use package managers like yum to install software on Linux platforms. Using package managers facilitates installation and administration (e.g. updates) of the software. However, when the software requires special options to be set or modules/extensions to be enabled, it may be difficult to obtain software built to suit those requirements. In such cases, it will be required to compile the software from its source. If compiling, then it will be prudent to organize all compiled software in standard locations on the host.

Given below are the implementation steps that were used for installing PHP 5.3.3 and FreeTDS 0.91 on RHEL 6.2.

NOTE: All commands in the examples below must be executed with root privileges, unless otherwise stated.

STEP 1: Create Installation Directory Structure

  • Use the following commands to create an appropriate directory structure for compiled software:
mkdir /opt/src
  • Use a standard location (e.g. /opt) to install all software compiled from source to facilitate administration (re-compilation, removal, etc.)

STEP 2: Download and Unpack Software Source

  • Download software sources (typically *.tar.gz files) and place them in /opt/src
  • Unpack source software (*.tar.gz) as per the following examples:
tar xfz php-5.3.3.tar.gz
tar xfz freetds-0.91.tar.gz

The above commands will create directories /opt/src/php-5.3.3 and /opt/src/freetds-0.91

STEP 3: Compile and Build FreeTDS

Compile and build FreeTDS as per the example below:

cd /opt/src/freetds-0.91
./configure --prefix=/opt/freetds-0.91
make
make install

NOTE: In order to facilitate administration, you may create a soft link as follows:

cd /opt
ln -s freetds-0.91 freetds

STEP 4: Compile and Build PHP

Compile and build PHP as per the example below:

cd /opt/src/php-5.3.3
./configure --prefix=/opt/php-5.3.3 --with-config-file-path=/opt/php-5.3.3
make
make install

NOTE: In order to facilitate administration, you may create a soft link as follows:

cd /opt
ln -s php-5.3.3 php

STEP 5: Compile and Build the PHP Sybase Extension

PHP requires the sybase_ct extension to allow PHP code to interface with MSSQL. You may compile and build the sybase_ct extension as follows:

cd /opt/src/php-5.3.3/ext/sybase_ct
sh ../../scripts/phpize
./configure --prefix=/opt/php --with-php-config=/opt/php/bin/php-config --with-sybase-ct=/opt/freetds
make
make install

STEP 6: Enable the PHP Sybase Extension

Enable the sybase_ct extension by adding the following line to /opt/php/php.ini

extension=sybase_ct.so

STEP 7: Verify the PHP (with FreeTDS) Installation

If PHP and the sybase_ct extension have been successfully installed, you should be able to view the sybase_ct module when displaying the PHP configuration information as shown below:

Execute the following command (any user):

php -i | grep sybase_ct

If you see "sybase_ct" in the output, then it means that PHP and the sybase_ct extension have been successfully installed.

NOTE: Since PHP and FreeTDS have been compiled from source and installed in non-standard locations, you must add /opt/php/bin:/opt/freetds/bin to a user’s PATH environment variable.

VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

How to install Tomcat 6 on RHEL 6

Installing software on RHEL platforms using yum is straightforward. However, based on your environment, there could be a few more steps to get there. So, here’s what I did to install Tomcat 6 on RHEL 6.2:

Environment:

OS: Red Hat Enterprise Linux Server release 6.2 (Santiago)

Yum Repos: Red Hat Enterprise Linux Server (v. 6 for 64-bit x86_64), RHN Tools for RHEL (v. 6 for 64-bit x86_64)

Implementation:

STEP 1: Install the Tomcat6 Web Servlet container

sudo yum groupinstall web-servlet

STEP 2: Enable the Tomcat6 service

sudo chkconfig tomcat6 on

STEP 3: Change ownership for the Tomcat6 resources

When tomcat6 is installed via STEP 1, a user and group with the same name (tomcat) is created. For security, the user is created without an interactive login shell (/sbin/nologin). So, in order to ensure that the application support individuals don’t require root privileges, you must do the following:

 

sudo chown -R tomcat:tomcat /usr/share/tomcat6
sudo chown -R tomcat:tomcat /etc/tomcat6/*

 

NOTE: By default, the tomcat user is created with umask 022 and so individual accounts will require sudo privileges to modify the resources owned by tomcat. This also ensures that all operations on tomcat resources are audited.

 

STEP 4: Test the Tomcat6 service

After doing STEPS 1-3, I started/stopped tomcat6 using the following commands:

sudo service tomcat6 start
sudo service tomcat6 stop

Tomcat6 started and stopped successfully (and http://localhost:8080 was accessible), but the following two messages in catalina.out bugged me:

INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib

SEVERE: destroyMBeans: Throwable javax.management.MalformedObjectNameException: Cannot create object name for org.apache.catalina.connector.Connector@290fd7f6 at org.apache.catalina.mbeans.MBeanUtils.createObjectName(MBeanUtils.java:764) at org.apache.catalina.mbeans.MBeanUtils.destroyMBean(MBeanUtils.java:1416)

.

The first INFO message was logged whenever tomcat6 was started and the SEVERE message was logged whenever tomcat6 was stopped.

Getting rid of the INFO message requires installing the Tomcat Native Library (see STEP 5) and it’s recommended that you do this for optimal performance (native code faster than Java bytecode).

Regarding the SEVERE message, it seems to have been fixed in Tomcat 6.0.25 ( refer Tomcat6 Bug ) and the version I installed using the above steps was 6.0.24. As this error is harmless, I’d wait for 6.0.25.

 

STEP 5: Install the Tomcat Native Library

Unfortunately, the standard RHEL yum repos used in our company (see Environment above) did not contain packages for the Tomcat Native Library. So, here’s what I did to install the library:

  • Install the pre-requisite packages
sudo yum install apr apr-devel java-1.6.0-openjdk-devel.x86_64 openssl-devel.x86_64

NOTE: I required only the above packages, but your requirement may vary based on your existing OS installation.

  • Download the tomcat native library from here
  • Execute the following commands:
# Extract downloaded tar
tar xvzf tomcat-native-1.1.22-src.tar.gz

# Configure
cd tomcat-native-1.1.22-src/jni/native
sudo ./configure \
--with-apr=/usr/bin/apr-1-config \
--with-java-home=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64 \
--with-ssl=/usr/include/openssl \
--prefix=/usr/lib64

# Make
sudo make

#Make Install
sudo make install

# The steps above installed the library in /usr/lib64/lib. As the default LD_LIBRARY_PATH 
#includes /usr/lib64, you may either change the path or set up links as shown below:

cd /usr/lib64
sudo ln -s lib/libtcnative-1.so.0.1.22 libtcnative-1.so
sudo ln -s lib/libtcnative-1.so.0.1.22 libtcnative-1.so.0

Successful installation of the Tomcat Native library will show something similar to the following in catalina.out when you start the tomcat6 service:

INFO: Loaded APR based Apache Tomcat Native library 1.1.22.

I observed that the Tomcat Native Library made quite an improvement to the tomcat6 server start time. Prior to installation, tomcat 6 started in about 144ms and after installation, it took only around 77ms!

VN:F [1.9.22_1171]
Rating: +23 (from 35 votes)

How to build AMP from source on RHEL 5.7

Typically, building a LAMP system on RHEL may be performed by yum installs. However, I wanted specific options built-in for my AMP and I wanted to locate the software in specific locations. hence, I opted to compile from source. It ain’t scary, but took me a few iterations to get stuff sorted out and this article describes what I did:

My LAMP System:

  • L – RHEL 5.7 (kernel 2.6.18-274.3.1.el5)
  • A – Apache 2.2.20
  • M – MySQL 5.5.15
  • P – PHP 5.3.8

STEP 1: Install Apache HTTP

Pre-requisites:

  • Create a user for Apache. This user will be used to launch the httpd child processes (assuming that the root user will launch the parent process to listen at port 80 (or any port < 1024). I created a user called apache as shown below (command executed as the root user):

    useradd -c "Apache HTTP" -s /bin/bash -m apache
  • Select a location to install apache and ensure that the user created in the above step has appropriate privileges. I executed the following commands as the root user:

    mkdir /opt/apache-2.2.20
    chown -R apache:apache /opt/apache-2.2.20

Installation:

As the apache user, I executed the following:

tar -xvzf httpd-2.2.20.tar.gz
cd httpd-2.2.20
./configure --prefix=/opt/apache-2.2.20 --enable-so

STEP 2: Install MySQL

Pre-requisites:

  • Create a user for MySQL. This user will be used to launch the mysqld process. I created a user called mysql as shown below (command executed as the root user):

    useradd -c "MySQL Admin" -s /bin/bash -m mysql
  • Select a location to install mysql and ensure that the user created in the above step has appropriate privileges. I executed the following commands as the root user:

    mkdir /opt/mysql-5.5.15
    chown -R mysql:mysql /opt/mysql-5.5.15
  • You may have to install some packages to build MySQL. I installed packages as per the following command (executed as the root user):

    yum install gcc gcc-c++.x86_64 cmake ncurses-devel libxml2-devel.x86_64

Installation:

As the mysql user, I executed the following:

tar -xvzf mysql-5.5.15.tar.gz
cd mysql-5.5.15
cmake . -DCMAKE_INSTALL_PREFIX=/opt/mysql-5.5.15 -DSYSCONFDIR=/opt/mysql-5.5.15
make
make install

STEP 3: Install PHP

Pre-requisites:

  • Select a location to install php and ensure that the appropriate user (web server user e.g. apache) created in the above step has appropriate privileges. I executed the following commands as the root user:

    mkdir /opt/php-5.3.8
    chown -R apache:apache /opt/php-5.3.8
  • As I needed a few packages for the phpMyAdmin application and other bespoke PHP applications, I did the following (use a combination of yum and rpm as I did not find all packages in my yum repositories):

    # As root user
    rpm -ivh libmcrypt-2.5.7-1.2.el5.rf.x86_64.rpm
    rpm -ivh libmcrypt-devel-2.5.7-1.2.el5.rf.x86_64.rpm
    rpm -ivh mhash-0.9.9-1.el5.rf.x86_64.rpm
    yum install php53-mbstring.x86_64 bzip2 bz2 libbz2 libbz2-dev autoconf
    tar -xvzf mcrypt-2.6.8.tar.gz
    cd mcrypt-2.6.8
    ./configure --disable-posix-threads --prefix=/opt/mcrypt

Installation:

As the apache user, I executed the following:

tar -xvzf php-5.3.8.tar.gz
cd php-5.3.8
./configure --prefix=/opt/php-5.3.8 --with-apxs2=/opt/apache-2.2.20/bin/apxs --with-config-file-path=/opt/php-5.3.8 --with-mysql=/opt/mysql-5.5.15 --with-bz2 --with-zlib --enable-zip --enable-mbstring --with-mcrypt
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

WebLogic Server crashes due to 2 GB stdout file

Problem:

A WebLogic managed Server crashed with no relevant information whatsoever in the logs. The server was started by a Node Manager.

Background & Analysis:

WebLogic Version: 8.1 SP6 (cluster with 2 managed servers)
JVM version: 32-bit JRockit R27.6 1.4.2_21
Operating System : 64-bit RHEL 4.0 AS update 7 (kernel 2.6.9)
When the server crashed, the server and stderr logs had no clues regarding the cause of the crash. However, the stdout log was 2147483647 bytes (2 GB) as it was not rotated. The last modified time of the stdout file was the same as the time when the server crashed. The very same scenario was observed when the other server in the cluster crashed. The filesystem is large-file aware.

Solution:

Rotate and archive the stdout file, so that the JVM running WebLogic does not crash when stdout reached 2 GB in size.

NOTE: All logs (server, stderr, stdout, application) must be effectively rotated and archived. I’ve seen several enterprise environments fall victim to lack of log housekeeping. To rotate files like the JVM’s stdout and stderr, it’s best to use the copy-truncate method (make a copy of existing file and then truncate existing file) as the JVM will still have a file descriptor open for the file. You may lose a tiny amount of log information using this method, but it’s less harmful than your server crashing. Removing or renaming a file with an open file descriptor will only make the problem invisible to you as the JVM will still be writing to the old file descriptor and growing a file in a location other than your logs directory (/proc).

Root Cause:

The JVM’s stdout file reached 2GB in size.

 

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: +4 (from 6 votes)

DFC errors on WebLogic 8.1 SP6 Cluster

Problem:

When a clustered application deployed on a WebLogic 8.1 SP6 cluster tries to connect to a Documentum Content Server during load tests, one or more servers in the WebLogic cluster crash with a core dump. Given below is an extract of the exception stack trace from a JRockit dump file (jrockit..dump):

Stack 0: start=0xf3068000, end=0xf308c000, guards=0xf306d000 (ok), forbidden=0xf306b000 Thread Stack Trace: at (???.c)@0x32553338
— Java stack —
at com/documentum/dmcl/Dmcl40.get(Ljava/lang/String;)Ljava/lang/String;(Native Method)
at com/documentum/fc/connector/DfConnection.apiGet(DfConnection.java:180)
^– Holding lock: com/documentum/fc/connector/DfConnection@0x1c4a3970[thin lock]
at com/documentum/fc/connector/DfConnection.(DfConnection.java:155)
at com/documentum/fc/connector/DfConnectionFactory.getConnection(DfConnectionFactory.java:25)
at com/documentum/fc/client/DfClientSupport.getConnection(DfClientSupport.java:623)
at com/documentum/fc/client/DfClientSupport.newSession(DfClientSupport.java:183)
at com/documentum/fc/client/DfSessionManager.newManualSession(DfSessionManager.java:753)
^– Holding lock: com/documentum/fc/client/impl/session/IdentityContext@0x1ce1a2b0[thin lock]
at com/documentum/fc/client/DfSessionManager.createSessionHelper(DfSessionManager.java:627)
at com/documentum/fc/client/DfSessionManager.getSession(DfSessionManager.java:559)
^– Holding lock: com/documentum/fc/client/DfSessionManager@0x1cdfb7c0[thin lock]
at com/documentum/fc/client/DfSessionManager.getSession(DfSessionManager.java:362)

Background & Analysis:

WebLogic Version: 8.1 SP6 (cluster with 2 managed servers)
JVM version: JRockit R27.6 1.4.2_21
DFC version: 5.3 SP6
The JVM crashed whenever the application, under load, attempted to connect to the Documentum Content Server via the Documentum Foundation Classes (DFC) API. When we ran only one server in the cluster (kept the other server in shutdown state), the running server worked fine with no crashes and DFC errors.

Solution:

Upgrade DFC to version 6.0 or higher.

Workaround: If you cannot upgrade to DFC 6 or higher for some reason, then as a workaround, ensure that you have some means of "priming" your application. i.e. initiate connections (to Documentum) from your application (unit testing) on all WebLogic servers within the cluster before your application takes on load. We had to use this workaround as WebLogic 8.1 SP6 is not supported on JVM 1.5+ and DFC 6+ is only supported on JVM 1.5+.

Root Cause:

DFC 5.3 SP6 is not compatible with WebLogic clusters. It is supported only on a single WebLogic node. DFC versions 6 and higher are supported on WebLogic clusters.

 

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Problem:

Messaging Bridge on WebLogic Server 8.1 does not start. Following errors seen in server log:

####<Jan 27, 2010 10:10:13 AM GMT> <Info> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4’ for queue: ‘MessagingBridge’> <<WLS Kernel>>
<> <BEA-200032> <Bridge "MyBridge" is configured to disallow degradation of its quality of service in cases where the configured quality of service is unreachable.>
####<Jan 27, 2010 10:10:13 AM GMT> <Error> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4’ for queue: ‘MessagingBridge’> <<WLS Kernel>> <> <BEA-200025> <Bridge "MyBridge" failed to start, because the quality of service configured (Exactly-once) is unreachable. This is likely due to an invalid configuration or adapter limitations.>
####<Jan 27, 2010 10:10:13 AM GMT> <Info> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4’ for queue: ‘MessagingBridge’> <<WLS Kernel>> <> <BEA-200034> <Bridge "MyBridge" is shut down.>

           

NOTE: MyBridge connects a WebLogic JMS destination (source) to an MQ destination (target).

Background & Analysis:

In order for Messaging Bridges on WebLogic 8.1 to use Exactly-Once QOS, the following requirements must be met:

  • Messaging Bridge adapter must be jms-xa-adp.rar and its JNDI name is eis.jms.WLSConnectionFactoryJNDIXA.
  • Connection Factories for source Bridge destinations must be XA-enabled.
  • Connection Factories used for target Bridge Destinations must be XA-enabled.
  • Messaging Bridges must be configured with Exactly-Once QOS.
  • The “QOS Degradation Allowed” checkbox must be unchecked.

With the above, it is recommended that the Messaging Bridges be Synchronous for better performance (fewer transaction commits).

From the log snippet above, you can see that the Messaging Bridge MyBridge could not start because the QOS (Exactly-Once) was unreachable and the Bridge was not allowed to degrade its QOS.

The QOS will typically be unreachable due to adapter, bridge configuration or bridge destination configuration issues as referred to in the log snippet.

Solution:

Enabled XA on the ConnectionFactory for the target Bridge Destination on MQ.

In order to satisfy the Exactly-Once QOS, both source and target destination connection factories must be XA-enabled.

Root Cause:

The ConnectionFactory for the target Bridge Destination on MQ was configured as non-XA, thereby preventing the Messaging Bridge from initiating an XA connection from the WebLogic Bridge destination to the MQ Bridge destination. Since the Messaging Bridge was not allowed to lower its QOS to make the connection, it failed to start properly.

 NOTE: By allowing QOS degradation, the MessagingBridge will connect to the MQ destination even if the ConnectionFactory for the Bridge Destination on MQ were non-XA. However, the choice of QOS must be driven by business requirements and not by technical workarounds.

 Reference: Oracle Documentation

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: +9 (from 9 votes)

Problem:

WebLogic Administration Server (with WebLogic Integration) 8.1 does not start. Following errors seen in stdout/stderr/server logs:

####<Jan 25, 2010 4:30:26 PM GMT> <Error> <JDBC> <myhost> <myadmin> <main> <<WLS Kernel>> <> <BEA-001151> <Data Source "cgDataSource" deployment failed with the following error: null.>

####<Jan 25, 2010 4:30:26 PM GMT> <Info> <JDBC> <myhost> <myadmin> <main> <<WLS Kernel>> <> <BEA-001156> <Stack trace associated with message 001151 follows:

weblogic.common.ResourceException

        at weblogic.jdbc.common.internal.DataSourceManager.createDataSource(DataSourceManager.java:264)

####<Jan 25, 2010 4:30:30 PM GMT> <Error> <WLW> <myhost> <myadmin> <main> <<WLS Kernel>> <> <000000> <Failed to obtain connection to datasource=cgDataSource, using generic DB properties>

####<Jan 25, 2010 4:30:31 PM GMT> <Error> <WLW> <myhost> <myadmin> <main> <<WLS Kernel>> <> <000000> <Error in startup class com.bea.wli.store.DocumentStoreSetup Method: init:

java.lang.IllegalStateException: Unable to start DocumentStore:  com.bea.wli.store.DocumentStoreException: Could not find SQL Document Store cgDataSource

            .

            .

Background & Analysis:

WebLogic Integration (WLI) is a software Business Process Integration framework that runs on WebLogic Server. WLI also includes a console application (wliconsole) to manage WLI configuration. This console application is deployed on the WebLogic Administration Server. Since the console application interacts with the database, it uses default data sources and connection pools (e.g. cgDataSource and cgConnectionPool) for database connectivity.

The errors above indicate that the cgDataSource failed to deploy and consequently, a startup class could not obtain connections to the database, thereby failing deployment and preventing the Administration Server from starting.

Data Sources use Connection Pools to obtain database connections.

Solution:

Ensure that the Connection Pool for the cgDataSource is configured properly (correct JDBC driver, URL, credentials, etc.) and targeted/deployed on the Administration Server (not just the cluster).

Root Cause:

The connection pool (cgConnectionPool) for the data source cgDataSource was not deployed on the Administration Server.

 

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: -3 (from 5 votes)

How to encrypt passwords for WebLogic 9.x+?

In WebLogic versions prior to 9, JSAFE decryption exceptions and password encryption could be resolved/performed by simply setting plain-text passwords in config.xml as described here.

However, in WebLogic versions later than 9.0, setting plain-text passwords in config.xml (Production mode environments) will throw the following error:

<Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason: [Management:141266]Parsing Failure in config.xml: java.lang.IllegalArgumentException: In production mode, it’s not allowed to set a clear text value to the property: PasswordEncrypted of ServerStartMBean>

So, for WebLogic 9.x+ versions, if you doubt passwords in config.xml, you will need to encrypt plain-text passwords and configure them in config.xml as follows:

STEP 1: Encrypt the password you wish to change in config.xml.

Example: You experience JSAFE exceptions with the password of one of your connection pools and you doubt the encrypted password in config.xml.

  • Source the WebLogic environment (to set CLASSPATH and other variables) as follows:
cd <domain-dir>/bin
. ./setDomainEnv.sh

  • Encrypt the plain-text password as follows:
cd <domain-dir>
java weblogic.security.Encrypt <password>

STEP 2: Update the config files (config.xml, jdbc, etc.) with the encrypted password obtained above.

NOTE: The above encryption procedure will work even with earlier versions of WebLogic (e.g. 8.1).

VN:F [1.9.22_1171]
Rating: +17 (from 17 votes)

First and foremost, if feasible, please consider a Hotspot JVM upgrade. The 1.4.2 JVM is outdated and there have been several significant improvements to GC ergonomics in later Hotspot JVMs. The JDK 1.4.2 Standard edition reached its EOL in late 2008. However, you can purchase support for later updates of JDK 1.4.2 (later than 1.4.2_19) that are branded under Java For Business (JFB) v1.4.2, to receive security updates and critical fixes until 2013. With regards to CMS, Sun engineers were finding their feet with CMS in JVM 1.4.2, enhanced CMS and made it the default collector in JVMs 5 and 6 and are about to replace CMS with G1 (Generation First) in JVM 7.

For those of you still using the Concurrent Mark Sweep (CMS) garbage collector with Hotspot JVM 1.4.2, here’s some information which I hope you will find useful.

What is CMS garbage collector?

The Concurrent (actually mostly concurrent) Mark Sweep (CMS) garbage collector is a non-default old (tenured) generation garbage collector introduced in JVM 1.4 that manages a JVM’s heap by collecting garbage in the old (tenured) generation in a few phases, some of which are concurrent and the others which are stop-the-world. As a consequence of the concurrent phases, this collector minimizes GC pauses and hence is also referred to as the low pause collector.

When would you use the CMS Collector?

When you want to minimize GC pauses (frequent requirement for websites and interactive applications)

Does CMS require more cpu and memory than the default collectors?

Yes. You need adequate CPU resources (multiple processors) as CMS adds CPU overhead by using a background thread for the concurrent phases. You need adequate memory to allocate slightly larger heaps than you would for the default collectors, as objects will continue to enter the old generation during the mostly concurrent phases.

How do you enable the CMS Collector?

Use the JVM flag -XX:+UseConcMarkSweepGC. When you do this, -XX:+UseParNewGC is implicitly used. However, you may explicitly specify -XX:+UseParNewGC if you wish (but it’s just redundant).

Note: You cannot use the default collector or the parallel scavenge collector (-XX:+UseParallelGC) in the Young Generation when using CMS in the old generation. CMS is tightly coupled with ParNewGC in the young generation. ParNewGC is an enhanced version of the parallel scavenge collector that enables GC to be done in the young generation while CMS is in progress in the old generation.

 Does the CMS Collector require tuning?

You’re lucky if the CMS collector gives you optimal performance by just enabling it with -XX:+UseConcMarkSweepGC. Every Hotspot 1.4.2 JVM I’ve come across (enterprise systems), has required some CMS tuning for optimal performance. Of course, this requirement entirely depends on your application’s object profile. But, if tuning your Hotspot 1.4.2 is required, then tuning CMS requires more effort than tuning the default collectors and CMS has a bunch of JVM flags to play around with. Some important JVM flags to consider when tuning CMS are given below:

What do you need to watch out for when using CMS (problems/gotchas)?

  • The dreaded concurrent mode failure – a failure which occurs when CMS’ concurrent GC phases are interrupted for certain reasons and a Serial, Mark-Sweep-Compact GC is required.
  • Default SurvivorRatio=1024 and Default MaxTenuringThreshold=0. Note that these are default values only when using CMS with JVM 1.4.2 and can trouble you if you’re tuning your JVM for short-lived objects. If your application creates mostly short-lived objects and you wish to use the Young Generation as a filter to retain these objects as long as possible and clean them up with minor GCs (parallel scavenges) to reduce the pressure on the CMS collector, then you must change these default values as these default values ensure that the survivor spaces are not used. Refer this article to understand the peculiarities of MaxTenuringThreshold.
  • The value set by –XX:CMSInitiatingOccupancyFraction is used as same threshold for both old and permanent generation occupancies. i.e. CMS GC will be initiated in the old and permanent generations when either one of or both the old and permanent generation occupancy exceeds the value of CMSInitiatingOccupancyFraction. This is inconvenient and it implies that you must pay close attention to permanent generation occupancy also and size the permanent generation appropriately.
  • The -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled JVM flags could increase remark pauses, but can contain permanent generation growth (prevent OutOfMemory errors caused by a full permanent generation) and protect against poor GC in old generation (objects in the old generation that are referenced by classes in the permanent generation will not be collected until the classes in the permanent generation are collected).

What tools are available to assist you with JVM tuning?

First and foremost, before tuning your 1.4.2 JVM, ensure you profile your application and resolve application issues (e.g. memory leak). To tune your JVM, you have tools which broadly fall under two categories:

Runtime Monitoring: These tools attach to your JVM to provide loads of runtime data. These tools are useful, when you’re monitoring your JVM in runtime and using these tools interactively. An excellent tool is VisualVM and its VisualGC plugin. A screenshot of the VisualGC plugin is given below:

VisualVM_VisualGC

GC Log File Analysis: These tools enable you to do offline analysis of GC log files and prepare reports for trend analysis. If you wish to run a load test for a couple of hours and measure performance after the test, then you will need to capture GC logfiles and analyze them with such tools. Now, in this area, Sun has been lacking good tools. HP chose a wise strategy of defining a specific format for GC logfiles generated by HP JDKs and developing the excellent HPjmeter to parse the logfiles and create fancy charts along with several metrics. Well, the Sun forums indicate that a GCHisto plugin is being developed to analyze Hotspot GC log files. I have tried out this plugin (beta) and found it to be nowhere as comprehensive and sophisticated as HPjmeter. Well, will wait for GCHisto to be completed and plugged into VisualVM before trying it out again.

In order to assist my colleagues and me with Hotspot JVM 1.4.2 GC log file analysis when using CMS, I’ve developed a quick-and-dirty korn shell script to provide a summary of some key GC metrics for a specific GC logfile. You may download the script CMSGCStats.ksh and execute it without arguments (ksh CMSGCStats.ksh) for usage tips. Refer a sample screenshot of the script’s output below:

 

CMSGCStats

References:

(1) JVM 1.4.2 Garbage Collectors

 

COLLECTOR GENERATION JVM FLAGS TO TURN ON DESCRIPTION
Serial Young None. Default. Single-threaded, stop-the-world, copying collector
Parallel Scavenge Young -XX:+UseParallelGC Multi-threaded, stop-the-world, copying collector (not to be used with CMS)
ParNew Young -XX:+UseParNewGC

Multi-threaded, stop-the-world, copying collector to be used along with CMS. This option is automatically turned on when using CMS and doesn’t have to be explicitly specified.

Serial Old Old None. Default.

Single-threaded, stop-the-world, mark-sweep-compact collector

CMS Old -XX:+UseConcMarkSweepGC

Mostly concurrent low-pause collector that uses a background thread for the concurrent phases.

VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

The MaxTenuringThreshold for a Hotspot JVM

What is MaxTenuringThreshold?:

In a Sun Hotspot JVM, objects that survive Garbage Collection in the Young Generation are copied multiple times between Survivor Spaces before being moved into the Tenured (Old) Generation. The JVM flag that governs how many times the objects are copied between the Survivor Spaces is MaxTenuringThreshold (MTT) and is passed to a JVM as –XX:MaxTenuringThreshold=n , where n is the number of times the objects are copied. The default value of ‘n’ is (or actually was) 31.

Setting the MaxTenuringThreshold:

A few years ago, while my colleagues and I were tuning a 1.4.2_11 Hotspot JVM using flags like PrintTenuringDistribution and tools like  visualgc, we found that setting MTT=10 along with other flags gave us the best results (JVM throughput, pause time, footprint). However, recently when tuning a 1.4.2_11 Hotspot JVM for another application that had mostly short-lived objects, I suggested testing a value of MTT=80 (I still have no idea how the value 80 came to my mind) which is ridiculous as you’ll soon know. My objective was to retain the short-lived objects for as long as possible in the Young Generation to allow them to be collected by Minor GCs as opposed to the Full GCs in the tenured generation. Anyway, all our performance tests of the application on JVM 1.4.2_11 with MTT=80 and other JVM flags showed significant improvement in JVM performance than before (when it was untuned).

Last week, I came across some interesting proposals discussed among Sun engineers last year, regarding modifying the way MTT is handled by the JVM. I don’t know whether the proposals have been implemented, but they give some good insight into how MTT works. To quote those discussions,

Each object has an "age" field in its header which is incremented every time an object is copied within the young generation. When the age field reaches the value of MTT, the object is promoted to the old generation (I’ve left out some detail here…). The parameter -XX:+NeverTenure tells the GC never to tenure objects willingly (they will be promoted only when the target survivor space is full). (out of curiosity: does anyone actually use -XX:+NeverTenure?) Originally, in the HotSpot JVM, we had 5 bits per object for the age field (for a max value of 31, so values of MTT would make sense if they were <= 31). A couple of years ago (since 5u6 IIRC), the age field "lost" one bit and it now only has 4 (for a max value of 15).

Refer http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2008-May/000309.html for more details. So, basically , when I set MTT=80, the JVM would have actually “never tenured” the objects in the Young Generation until the Survivor Spaces were full. Hope Sun have fixed that problem as per the proposal or at least provide proper documentation (similar to the referenced article) for their JVMs which explains how MTT works. Well, MTT=80 did not have an adverse impact on our application, but we eventually switched to MTT=8 (the value 8 was a guess and didn’t provide very different results). I suggest that MTT is not set at first and be used only if your analysis of GC logs and your requirements indicate that you need to retain short-lived objects in the Young Generation for longer. As a matter of fact, when tuning a JVM, always start with basic flags for Heap Size and nothing else. Then, based on load tests, customer experience metrics (e.g. response time, response errors) and analysis of GC logs, set JVM flags and retest. Tuning is iterative and apart from all the tools available, a must-have quality (especially for complex applications) is patience.

VN:F [1.9.22_1171]
Rating: +14 (from 14 votes)

Problem:

WebLogic10_ActivateChanges

 

When activating changes (clicking the button "Activate Changes" as shown in the image on the left) on the Administration console of a WebLogic 10.0 MP1 domain comprising an admin server and two managed servers (each managed server on a different host), it took around 5 minutes for the activation to complete.

 

 

 

 

Background:

From WebLogic Server versions 9.x and later, any changes performed on the Administration console must go through a three-step process – (1) Lock and Edit (2) Edit config (3) Activate Changes. It’s the third step in this process that took about 5 minutes to complete. The changes were successfully made, albeit after 5 minutes. Interestingly, when we located all the managed servers in the domain on the same host, this problem disappeared and the activation of changes took less than 10 seconds. However, locating all managed servers on one host cannot be a solution. We enabled debug for Deployment on all servers. Given below is the output of the debug during occurrence of the problem:

 

####<Sep 29, 2009 10:56:45 AM BST> <Debug> <Deployment> <myhost> <myadmin> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1254218205661> <BEA-000000> <Experienced Exception while c.tryLock() and it is ignored :: java.nio.channels.OverlappingFileLockException
at sun.nio.ch.FileChannelImpl.checkList(FileChannelImpl.java:853)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:820)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.getFileLock(ConfigDataUpdate.java:374)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.getFileLock(ConfigDataUpdate.java:357)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.acquireFileLock(ConfigDataUpdate.java:338)
.
.
.

 

Solution:

After liasing with Oracle Support, we upgraded our JVM and the upgrade resolved the problem. After the upgrade, the activation of changes took less than 10 seconds irrespective of whether the managed servers were located on the same host or not. Details of the upgrade are given below:

Old JVM:

java version “1.5.0_14”
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
BEA JRockit(R) (build R27.5.0-110_o-99226-1.5.0_14-20080528-1505-linux-x86_64, compiled mode)

 

New JVM:
java version “1.5.0_17”
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_17-b04)
BEA JRockit(R) (build R27.6.3-40_BR8141840-120019-1.5.0_17-20090828-1133-linux-x86_64, compiled mode)

 

Root Cause:

Bug in JVM 1.5.0_14 (JRockit R27.5.0)

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.9.22_1171]
Rating: +6 (from 6 votes)

Problem:

The Administration server of a WebLogic domain comprising WebLogic Server 10.0 and WebLogic Integration 10.2, consumes high CPU and throws java.lang.OutOfMemory errors.

 

Background:

The WebLogic Domain’s admin server had only two web applications deployed on it – the WebLogic Administration console and WebLogic Integration console. After start-up, its CPU utilization gradually increased and reached around 80% within a couple of days. Also, java.lang.OutOfMemory errors were observed in the server logs. This behaviour was observed even when there was no load on the managed servers and the web applications on the admin server were not accessed (all servers idle from a user perspective).

WebLogic Domain details:

Version: WebLogic Server 10.0 MP1, WebLogic Integration 10.2
JVM: JRockit R27.5.0-110 (JRE Standard Edition build 1.5.0_14-b03)
Admin Server JVM Heap: Minimum (Xms) = Maximum (Xmx) = 2 GB
Number of managed servers: 2
Operating System: 64-bit Red Hat Enterprise Linux 5.1
CPU Architecture: AMD64

 

Solution:

The following patches were applied and the problem was resolved. Contact Oracle support or use their Smart Update procedure to obtain the patches.

SL# PATCH COMMENTS
1. D76T CR380997 Admin server gives OOM: Closed the Queue and Session Objects properly.
2. LJTR CR373884 Unable to apply some of the patches for jpd.jar when using "inject" mechanism
3. ZSX5 BUG8174387 MEMORY LEAK OBSERVED ON ADMIN SERVER: No public details available. Patch provided for WLI 10.2

 

Root Cause:

Known issues with WebLogic Integration 10.2

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.9.22_1171]
Rating: +1 (from 3 votes)

WebLogic – IP Multicast : A primer

In Oracle WebLogic (formerly BEA WebLogic) versions prior to version 10.0, WebLogic Servers relied on IP multicast to ensure cluster membership (versions 10.0 and later provide the alternative of Unicast which is preferred over Multicast). This article will pertain to IP multicast used by WebLogic.

What is IP multicast?

IP multicast is a technology used to broadcast data (or datagrams) across a network using IP. For IP multicasting, certain special IP addresses called multicast addresses are defined. According to the Internet Assigned Numbers Authority (IANA), its RFC3171 guidelines specify that addresses 224.0.0.0 to 239.255.255.255 are designated as multicast addresses. A multicast address is associated with a group of receivers. When a sender wishes to send a datagram to a group of receivers using IP multicast, it will send the datagram to the multicast address/port associated with that group of receivers. When routers or switches on the network receive the datagram, they know which servers (receivers) are associated with the multicast address (using IGMP) and so they make copies of the datagram and send a copy to every registered receiver. This is illustrated in the figure below:

 

HowMulticastingWorks

 

Why does WebLogic use IP multicast?

A WebLogic Cluster is a group of WebLogic servers which provides similar services, with resilience (if a server crashes and exits the cluster, it can rejoin the cluster later), high availability (if a server in the cluster crashes, other servers in the cluster can continue the provision of services) and load balancing (the load on all servers in a cluster can be uniformly distributed) for an application deployed on a WebLogic cluster. WebLogic makes these beneficial clustering features possible by using IP Multicast for the following:

(1) Cluster heartbeats: All servers in a WebLogic cluster must always know which servers are part of the cluster. To make this possible, each server in the cluster uses IP multicast to broadcast regular "heartbeat" messages that advertise its availability.

(2) Cluster-wide JNDI updates: Each WebLogic Server in a cluster uses IP multicast to announce the availability of clustered objects that are deployed or removed locally.

 

How does WebLogic use IP multicast?

All servers in a WebLogic Cluster send out multicast fragments (heartbeat messages) from their interface addresses to the multicast IP address and port configured for the WebLogic Cluster. All servers in the cluster are registered with the multicast address and port and so every server in the cluster receives fragments from all other servers in the cluster as well as the fragment it sent out. So, since every server in the cluster sends out fragments every 10 seconds, based on the fragments it receives, it can determine which servers are still part of the cluster. If a server (say Server A) does not receive a fragment from another server in the cluster within 30 seconds (3 multicast heartbeat failures), then it will remove that server from its cluster membership list. When fragments from the removed server start arriving at Server A, then Server A will add the removed server back to its cluster membership list. This way, every server in a WebLogic cluster maintains its own cluster membership list. Regarding cluster-wide JNDI updates, each server instance in the cluster monitors these announcements and updates its local JNDI tree to reflect current deployments of clustered objects.

Note: Clustered server instances also monitor IP sockets as a more immediate method of determining when a server instance has failed.

The figure below illustrates how a WebLogic cluster uses IP multicast.

 

HowWebLogicUsesIPMulticast

 

How do you configure and test multicast for a WebLogic Cluster?

Configuring IP Multicast for a WebLogic Cluster is simple. The steps required are given below:

STEP 1: If your WebLogic cluster is part of a network containing other clusters, obtain a multicast address and port for it, from your Network Admins. Understandably, a multicast address and port combination is unique for every WebLogic cluster. Several WebLogic clusters may share the same multicast address if and only if they use different multicast ports. Typically, in organizations, network admins allocate multicast addresses and ports for various projects to ensure there are no conflicts across the network. By default, WebLogic uses a multicast address of 237.0.0.1 and the listen port of the Administration server as the multicast port.

 

STEP 2: Having obtained a multicast address and port for your WebLogic cluster, you must test them before starting your WebLogic cluster to ensure that there are no network glitches and conflicts with other WebLogic clusters. You may do so with the MulticastTest utility provided with the WebLogic installation (part of weblogic.jar). An example test for a cluster containing 2 WebLogic servers on UNIX hosts and using multicast address/port of 237.0.0.1/30000 is given below:

# Command to run on both server hosts (any one of the following within the WebLogic domain directory) to set the WebLogic domain environment
. ./setDomainEnv.sh
. ./setEnv.sh

# Command to run on server 1 (within any directory)
${JAVA_HOME}/bin/java utils.MulticastTest -N server1 -A 237.0.0.1 -P 30000

# Command to run on server 2 (within any directory)
${JAVA_HOME}/bin/java utils.MulticastTest -N server2 -A 237.0.0.1 -P 30000

# NOTE: Both java commands must be run on both WebLogic server hosts concurrently.

 

View screenshots of the tests executed (on Windows Vista) when the WebLogic cluster was running (conflicts between test and running cluster outlined in red) and when the WebLogic cluster was stopped, by clicking on the images below:

 

utils.MulticastTest with cluster running - click to view     utils.MulticastTest with cluster stopped - click to view

 

Note: On Vinny Carpenter’s blog, he mentions a problem when using the utils.MulticastTest utility bundled with WebLogic Server 8.1 SP4. Well, I have never faced any issues with the utils.MulticastTest utility, but I am not sure if I’ve used it with WLS 8.1 SP4.

 

STEP 3: After successfully testing the multicast address and port, you may use the WebLogic Administration Console to configure multicast for the cluster. Descriptions of various multicast parameters are available on the console itself. The three most important parameters are (1) Multicast Address, (2) Multicast Port and (3) Interface Address. The Interface Address may be left blank if the WebLogic servers use their hosts’ default interface. On multi-NIC machines or in WebLogic clusters with Network channels, you may have to configure an Interface Address. Given below is a screenshot from a WLS 8.1 SP6 Administration Console indicating the various multicast parameters that may be configured for a cluster. Note that the interface address is on a different screen as it is associated with each server in the cluster, rather than the cluster itself.

 

ConfigureMulticast

 

After configuring Multicast for a WebLogic cluster, you can monitor the health of the cluster and exchange of multicast fragments among the servers in the cluster by using the WebLogic Administration console. A screenshot of such a monitoring screen with WLS 8.1 SP6 is given below:

 

Monitoring a WebLogic Cluster using the Administration Console

 

Note that the screenshot above indicates that:

(1) All servers are participating in the cluster ("Servers" column).

(2) Every server in the cluster is aware of every other server in the cluster. The "Known Servers" column is especially useful for large clusters to know exactly which servers are not participating in the cluster.

(3) The total number of fragments received by each server (34) is equal to the sum of all the fragments sent by all the servers in the cluster (17 + 17). Note that the "Fragments Sent" and "Fragments Received" columns on the console need not always indicate a correct relationship even if multicast works fine. That’s because these stats on the console are reset to 0 when servers are restarted.

 

Troubleshooting WebLogic’s Multicast configuration

When you encounter a problem with WebLogic multicast (or any problem for that matter), it is important to confirm the problem by executing as many tests as possible and gather as much data as possible when the problem occurs. For WebLogic multicast, you may confirm the problem by using the MulticastTest utility or checking the Administration console as described above. To troubleshoot WebLogic multicast, refer to the Oracle documentation. Also, check the section below to determine if the problem you’ve encountered is similar to one of the problems described, to provide you with a quick resolution.

 

WebLogic Multicast eureka!

Given below are WebLogic multicast problems which I’ve encountered and investigated, along with solutions that worked:

 

PROBLEM 1:

SYMPTOMS: All WebLogic servers could not see any other server in the cluster. Tests using the MulticastTest utility failed indicating that all servers could only receive the multicast fragments which they sent out.

ANALYSIS: The MulticastTest utility was tried with the correct multicast address, multicast port and interface address. No conflict with any other cluster was observed, but no messages were received from other servers. Assuming that all servers in the cluster are not hung, the symptoms indicate a problem with the underlying network or the multicast configuration on the network.

SOLUTIONS:

Solution 1: The Network Admin just gave us another multicast address/port pair and multicast tests worked. The multicast address/port pair which failed was not registered correctly on the network.

Solution 2: The Network Admin informed us that more than one switch was used on the cluster network and this configuration did not ensure that multicast fragments sent by one server in the cluster were copied and transmitted to other servers in the cluster. Refer to this CISCO document for details regarding this problem and its solutions. As a tactical solution, the Network Admin configured static multicast MAC entries on the switches (Solution 4 in the CISCO document). This tactical solution requires the Network Admin to maintain those static entries, but since there weren’t too many WebLogic clusters using multicast on the network, this solution was chosen.

Solution 3: The two managed servers in a cluster were in geographically separated data centres and several hops across the network were required for the servers to receive each other’s multicast fragments. Increasing the multicast TTL solved this problem and both the MulticastTest utility and the WebLogic servers successfully multicasted.


PROBLEM 2:

SYMPTOMS: The following errors were seen in the WebLogic managed server logs and the managed servers did not start.

Exception:weblogic.server.ServerLifecycleException: Failed to listen on multicast address
weblogic.server.ServerLifecycleException: Failed to listen on multicast address
        at weblogic.cluster.ClusterCommunicationService.initialize()V
           (ClusterCommunicationService.java:48)
        at weblogic.t3.srvr.T3Srvr.initializeHere()V(T3Srvr.java:923)
        at weblogic.t3.srvr.T3Srvr.initialize()V(T3Srvr.java:669)
        at weblogic.t3.srvr.T3Srvr.run([Ljava/lang/String;)I(T3Srvr.java:343)
        at weblogic.Server.main([Ljava/lang/String;)V(Server.java:32)
Caused by: java.net.BindException: Cannot assign requested address
        at jrockit.net.SocketNativeIO.setMulticastAddress(ILjava/net/InetAddress;)V(Unknown Source)
        at jrockit.net.SocketNativeIO.setMulticastAddress(Ljava/io/FileDescriptor;Ljava/net/InetAddress;)V(Unknown Source)
        .
        .
        .

ANALYSIS: The errors occured irrespective of whichever multicast address/pair was used. The error indicates that the WebLogic server could not bind to an address to send datagrams to the multicast address. i.e. it could not bind to its Interface Address

SOLUTION: The WebLogic server host was a multi-NIC machine and another interface had to be specified for communication with the multicast address/port. Specifying the correct interface address solved the problem.


PROBLEM 3:

SYMPTOMS: The following errors were seen in the WebLogic managed server logs. The managed servers were running, but clustering features (like JNDI replication) were not working.

<May 20, 2008 4:00:58 AM BST> <Error> <Cluster> <kips1host> <kips1_managed1> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1211252458100> <BEA-000170> <Server kips1_managed1 did not receive the multicast packets that were sent by itself>
<May 20, 2008 4:00:58 AM BST> <Critical> <Health> <kips1host> <kips1_managed1> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1211252458100> <BEA-310006> <Critical Subsystem Cluster has failed. Setting server state to FAILED.
Reason: Unable to receive self generated multicast messages>

ANALYSIS: The errors above indicate that the WebLogic server kips_managed1 could not receive its own multicast fragments from the multicast address/port. Probably, the server’s multicast fragments did not reach the multicast address/port in the first place and this points to an issue with the configuration of the interface address or route from interface address to multicast address/port or multicast address/port (most likely the interface address or route as if the multicast address/port were wrong, the server would not have received multicast fragments from any other server, as in PROBLEM 1).

SOLUTION: The WebLogic server kips_managed1 used -Dhttps.nonProxyHosts and -Dhttp.nonProxyHosts JVM parameters in its server startup command and these parameter values did not contain the name of the host which hosted kips_managed1. After including the relevant hostname in these parameter values, the errors stopped occurring. I am not sure how these HTTP proxy parameters affected self-generated multicast messages (will try to investigate this).


PROBLEM 4:

SYMPTOMS: All WebLogic servers were often (not always) part of a cluster and intermittently, servers in the cluster were removed and added and the LostMulticastMessageCount was increasing for some servers in the cluster. However, tests using the MulticastTest utility (when the cluster was stopped) were successful.

ANALYSIS: The problem occurred intermittently when the WebLogic servers were running, but never occurred when using the MulticastTest utility. This indicates that the underlying IP multicast works fine and something is preventing the servers in the cluster from IP multicasting properly. Further analysis revealed that the servers had issues with JVM Garbage Collection with long stop-the-world pauses (> 30 secs) during which the JVM did absolutely nothing else other than garbage collection. Also, the times of occurrences of these long GC pauses correlated with the times of increases in LostMulticastMessageCount and removal of servers from the cluster.

SOLUTION: The JVMs hosting the WebLogic servers were tuned to minimize stop-the-world GC pauses to allow the servers to multicast properly. For the specific GC problem I encountered, you may refer to the tuning details here.


Reference: Oracle documentation

 

NOTE:

Your rating of this post will be much appreciated. Also, feel free to leave comments (especially if you have constructive negative feedback).

VN:F [1.9.22_1171]
Rating: +41 (from 43 votes)

When using the Concurrent Low Pause or Concurrent Mark Sweep Garbage collector with a Sun Hotspot JVM, you may observe the following "concurrent mode failures" errors in your GC logs or stderr:

(concurrent mode failure): 1404669K->602298K(1482752K), 35.9660286 secs] 1988728K->602298K(2096576K), 46.2083962 secs]

A concurrent mode failure indicates that a concurrent garbage collection of the tenured generation did not complete before the tenured generation filled up.

 

Impact of a concurrent mode failure: When a 1.4 JVM encounters a concurrent mode failure, it will trigger the default stop-the-world garbage collector in the tenured generation, resulting in a relatively long pause. In the example error above, note that the concurrent mode failure results in a GC (default stop-the-world) of around 46 seconds.

 

When does it occur?: A concurrent mode failure typically occurs with the following scenarios:

(1) Heavy load: Heavy load on the application running in the JVM, causing a huge promotion from the young to the tenured generation is typically what causes concurrent mode failures.

(2) Young generation guarantee failure: With JVM 1.4, a requirement for successful promotions from young to tenured generation is to have a contiguous block of free space equal to the size of the young generation, in the tenured generation to cater to the worst case requirement of having to promote all objects. Even if there is required space in the tenured generation, but it is not contiguous (i.e. it is fragmented), promotion failures and concurrent mode failures could occur. Fragmentation problems typically manifest themselves after a long period of continuous use. So, your application and JVM may be running fine for a while and suddenly (when the tenured generation is too fragmented) exhibit these problems.

 

How to fix it?: To fix these concurrent mode failures with JVM 1.4, typical solutions are:

(1) Increase the size of the JVM heap and consequently the size of the old generation.

(2) Tune the JVM depending on your application profile.

(3) Scale your platform to distribute load and subject each JVM to a smaller load.

 

JVM Tuning: My colleagues and I observed that our Hotspot 1.4.2_11 JVMs running WebLogic 8.1 SP6 threw several concurrent mode failures during peak load, at specific times of the day. The JVM heap for each WebLogic managed server was 2 GB which was already quite big (cannot increase much more anyway, as were using a 32-bit 1.4.2 JVM on Solaris 9). So, we decided to tune the JVM. Here are details of our JVM tuning to significantly reduce the number of concurrent mode failures:

 

(1) JVM Options before tuning: Given below, are the JVM parameters used when the problem occurred (WLjvm output):

**************************************************************************************************
JVM DETAILS FOR WEBLOGIC SERVER m1 (PID = 9999)
**************************************************************************************************

VERSION & BUILD
===============

java version "1.4.2_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_11-b06)
Java HotSpot(TM) Client VM (build 1.4.2_11-b06, mixed mode)

OPTIONS
=======

-Dbt.weblogic.RootDirectory=/software/weblogic
-Dbt.weblogic.RootDirectory=/software/weblogic
-Dweblogic.Name=m1
-Dweblogic.ProductionModeEnabled=true
-Dweblogic.management.server=http://localhost:50000
-Dweblogic.system.BootIdentityFile=/software/weblogic/boot.properties
-XX:+CMSParallelRemarkEnabled
-XX:+JavaMonitorsInStackTrace
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseConcMarkSweepGC
-XX:+UseLWPSynchronization
-XX:+UseParNewGC
-XX:+UsePerfData
-XX:MaxNewSize=600m
-XX:MaxPermSize=128m
-XX:NewSize=600m
-Xloggc:/software/weblogic/logs/GC/m1_200909012300.log
-Xms2048m
-Xmx2048m
-Xoss2m
-Xss2m
-server

**************************************************************************************************

 

(2) GC Analysis: I used PrintGCStats to analyze GC logs. It is also important to determine the CMSInitiatingOccupancyFraction (the % occupancy of the tenured generation at which a CMS GC is triggered). As the JVM options above do not set this parameter, the JVM calculates its own CMSInitiatingOccupancyFraction  at runtime. The default for JVM 1.4.2 is 68%, but do NOT assume that the JVM always uses the default if you do not specify this parameter. Instead, analyze the GC logs to understand when the JVM triggers a CMS GC. The script CMSGCStats.ksh (formerly CMSGCTrigger.ksh) may be used for this purpose, along with the provision of other metrics. By using PrintGCStats and CMSGCTrigger.ksh, we determined that a lot of objects were promoted to the tenured generation and CMS GC was mostly triggered at 50% occupancy of the tenured generation with some occurrences between 60% to 75% also.

 

(3) Requirements Analysis: Our application mostly created short-lived objects and very low latency was not critical due to the mostly asynchronous services provided by the application. Hence, our goal was to retain most of the objects for as long as possible in the young generation, to facilitate their garbage collection in the Young generation, thereby promoting fewer objects to the tenured generation and reducing the probability of filling up the tenured generation during a CMS GC (causing concurrent mode failures).

 

(4) Implementation:

In order to garbage collect objects in the Young Generation and decrease the number of objects being promoted to the tenured generation, we tuned the following parameters:

 

JVM Parameter Why selected for tuning
-XX:NewSize, -XX:MaxNewSize These parameters control the size of the Young generation. Increasing the size of the Young generation increases probability of more garbage collection in the Young Generation. Rule of thumb is to size the Young generation to 3/8 of the maximum heap size.
-XX:MaxTenuringThreshold This parameter determines how long the objects in the Young generation may age (number of times they are copied between survivor spaces before being promoted to the Tenured generation). The default is 31. Increasing the value of this parameter increases probability of more garbage collection in the Young Generation. Refer to this article for more details.
-XX:TargetSurvivorRatio This parameter sets the desired percentage of the survivor spaces which must be used before objects are promoted to the Tenured generation. The default is 50. Increasing the value of this parameter increases probability of more garbage collection in the Young Generation.
-XX:SurvivorRatio This parameter is the ratio of the size of Eden to the size of each survivor space (SR = EdenSize/SurvivorSpaceSize) and hence it can be used to set the size of a survivor space. Decreasing the value of this parameter ensures larger survivor spaces and increases probability of more garbage collection in the Young Generation

 

In order to override the JVM’s choice of when to trigger a CMS GC in the tenured generation, we tuned the following parameters:

 

JVM Parameter Why selected for tuning
-XX:CMSInitiatingOccupancyFraction This parameter sets the threshold percentage occupancy of the old generation at which the CMS GC is triggered. The default for JVM 1.4.2 is 68, but the JVM deviates from this default at runtime. So, setting this parameter explicitly tells the JVM when to trigger CMS GC in the tenured generation.
-XX:+UseCMSInitiatingOccupancyOnly This parameter tells the JVM to use only the value defined by -XX:CMSInitiatingOccupancyFraction , rather than try to also calculate the value at runtime.

 

(5) Results: Based on 5 iterations of testing with different sets of the parameters mentioned above, we obtained best results with the following JVM parameters:

**************************************************************************************************
JVM DETAILS FOR WEBLOGIC SERVER m1 (PID = 8888)
**************************************************************************************************

VERSION & BUILD
===============

java version "1.4.2_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_11-b06)
Java HotSpot(TM) Client VM (build 1.4.2_11-b06, mixed mode)

OPTIONS
=======

-Dbt.weblogic.RootDirectory=/software/weblogic
-Dbt.weblogic.RootDirectory=/software/weblogic
-Dweblogic.Name=m1
-Dweblogic.ProductionModeEnabled=true
-Dweblogic.management.server=http://localhost:50000
-Dweblogic.system.BootIdentityFile=/software/weblogic/boot.properties
-XX:+CMSParallelRemarkEnabled
-XX:+JavaMonitorsInStackTrace
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseConcMarkSweepGC
-XX:+UseLWPSynchronization
-XX:+UseParNewGC
-XX:+UsePerfData
-XX:MaxPermSize=128m
-XX:MaxNewSize=768m
-XX:NewSize=768m
-XX:MaxTenuringThreshold=8
-XX:TargetSurvivorRatio=90
-XX:SurvivorRatio=4
-XX:CMSInitiatingOccupancyFraction=55
-XX:+UseCMSInitiatingOccupancyOnly
-Xloggc:/software/weblogic/logs/GC/m1_200909022300.log
-Xms2048m
-Xmx2048m
-Xoss2m
-Xss2m
-server

**************************************************************************************************

 

Some graphs of parameters before (baseline) and after JVM tuning for 4 WebLogic managed servers are given below:

 

JVM Tuning to eliminate concurrent mode failures

 

The above graphs indicate the following:

(1) The number of occurrences of "concurrent mode failures" significantly decreased after JVM tuning, although these errors weren’t eliminated.

(2) The amount of objects (MB) promoted from the young to the tenured generation significantly decreased after JVM Tuning, thereby indication more garbage collection in the young generation.

(3) The GC Sequential load (% of total time spent in GC with all application threads suspended) increased after JVM Tuning. Although this is not desired, given that our application did not require very low latency, we were willing to make a compromise and accept the increase in GC Sequential load. As a matter of fact, when our JVM changes were rolled out onto Production, the GC Sequential load was only around 1%.  We were just being cautious and testing our changes on a test environment with very high loads.

NOTE: The tuning exercise described above worked well for us. However, depending on your scenario, you may have to do further tuning. Also, there’s only so much you can benefit by tuning a JVM and you must give due consideration to scaling the platform and/or application design. In the example above, although JVM tuning helped, the load on our platform was too high and we had to scale our platform by adding more WebLogic managed servers (JVMs). Problems such as "concurrent mode failures" are typically caused by high loads when the CMS GC isn’t able to keep up with the rate of allocation of objects in the tenured generation.

REFERENCES:

(1) When the Sum of the parts – Jon Masamitsu

(2) What the Heck’s a Concurrent Mode? – Jon Masamitsu

VN:F [1.9.22_1171]
Rating: +4 (from 4 votes)

How to configure Apache 2.x as a Reverse Proxy

Recently, I used a Reverse Proxy server to work around a constraint. I wanted to host a monitoring dashboard (website) at port 80 on a corporate intranet host, but running Apache (or any software) to listen at port 80 (or any port from 1 to 1024) on UNIX requires root privilege. Since the use of root privilege has been significantly restricted by my System Administrators, it would have been very inefficient and cumbersome to administer the website because my team would have had to raise a formal request to the Sys Admins via Change Management procedures whenever it was required to restart/administer Apache. So, I used a standard user to run Apache (hosting the website) to listen the a non-privileged port 7777 and configured a reverse proxy Apache server instance to listen at port 80 and forward requests to the Apache server instances which hosted the website at port 7777. Now, we can do whatever administration work we require on the Apache web server instance hosting the website without being dependent on the Sys Admins. If the reverse proxy apache web instance goes down, we will need to seek assistance from the Sys Admins to start it, but there’s very low probability of the reverse proxy apache web instance going down. My use of a reverse proxy server, as described above, is illustrated in the figure below:

 

Reverse Proxy

Given below, are the steps I followed to set up a reverse proxy server using Apache 2.2.12 on Solaris 9:

Note: As a prerequisite, Apache 2.2.12 must be compiled with the ––enable-so option to enable Apache to load modules dynamically at runtime.

STEP 1: Build and load the Apache proxy modules

Apache requires the mod_proxy and mod_proxy_http modules to serve as a reverse proxy server for HTTP requests. The source code (.c files) for these modules are available in the apache source code repository. Build and install the required proxy modules using the APache eXtenSion (apxs) tool as follows:

# Note that the apache ServerRoot is /apache-2.2.12 in the examples below. Run the following commands in the source repository directory containing the relevant “.c” files (httpd-2.2.12/modules/proxy/).
#
# Build and install mod_proxy module. 
/apache-2.2.12/bin/apxs -i -a -o mod_proxy.so -c mod_proxy.c proxy_util.c

# Build and install mod_proxy_http module. 
/apache-2.2.12/bin/apxs -i -a -o mod_proxy_http.so -c mod_proxy_http.c proxy_util.c

STEP 2: Configure the required reverse proxy directives

Having extended Apache with "proxy" functionality in STEP 1, you will now need to tell Apache how you wish to use this new functionality and to do this, you must use the Proxypass and ProxyPassReverse directives in httpd.conf. Given below are directives which I used in httpd.conf to reverse proxy all HTTP requests coming in at port 80 to a website hosted on apache at port 7777:

ProxyPass / http://www.mydomain.com:7777/
ProxyPassReverse / http://www.mydomain.com:7777/

STEP 3: Restart Apache for the changes to take effect

mrkips@kipsserver: /apache-2.2.12/bin/apachectl -k restart
VN:F [1.9.22_1171]
Rating: +4 (from 6 votes)
 Page 1 of 2  1  2 »