Menu Close

Runaway process causes 100% disk utilization

Problem:

A Solaris 9 mountpoint was 100% utilized (as per “df”) and no new files could be added.

df output:

cybergavin@myhost:/dashboard> df –h /dashboard
Filesystem             size   used  avail capacity  Mounted on
/dev/vx/dsk/A19278-S01-7uitx-dg/dashboard
                        16G    16G   2.1M   100%    /dashboard

du output:

cybergavin@myhost:/dashboard> du –sk /dashboard
1789259 /dashboard

Background & Analysis:

As you can see above, both “du” and “df” provide significantly different metrics for the utilization of /dashboard. The “df” output tells me that I have very little free space (~ 2.1 MB) whereas the “du” output indicates that I have around 14 GB free space.

Well, first and foremost, df and du intend to give you disk usage stats, but they do not work in the same way. Refer this article to understand the differences between df and du.

Secondly, the mountpoint /dashboard was mounted on a VxFS. The dmesg output showed the following:

Feb  1 09:29:00 myhost vxfs: [ID 702911 kern.notice] NOTICE: msgcnt 112748 mesg 001: V-2-1: vx_nospace -  /dev/vx/dsk/A19278-S01-7uitx-dg/dashboard file system full (1 block extent)

An explanation for the above (quite obvious) message is given in this Symantec article.

I found a runaway background process (iostat –x 2) running for the past 2 months. It was a process launched by a shell script. The shell script exited, but the process wasn’t killed. The process was redirecting its output to a file and that file was also deleted. Consequently, the process’ stdout file descriptor (1) was not closed and the process was still writing to stdout. This caused the space occupied by the stdout to be hidden. To determine how much space is actually being used by the process when writing to stdout, try the following command (<pid> = process id):

 

ls -l /proc/<pid>/fd/1

 


Solution:

Killed the runaway process and the mountpoint utilization dropped significantly to 14%. Further, df and du outputs correlated.

Root Cause:

A runaway process was consuming most of the disk space and this disk space consumption was “hidden” because the file to which the process’ stdout was being redirected, was deleted.

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)
Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *