Menu Close

Cannot grep UTF-16 Unicode files

Problem:

Microsoft SQL Server error logs had I/O errors in them. After transferring 4 months’ logs over to a UNIX machine to analyze them with grep/awk/sed, the grep command did not return any output when searching for strings which were present as indicated when viewing the file using the vi editor.

Background & Analysis:

On the UNIX host, I checked the file type of the SQL Server error log as follows:

$> file ERRORLOG.1
ERRORLOG.1: Little-endian UTF-16 Unicode English character data, with very long lines,
with CRLF line terminators

So, the grep command couldn’t parse the UTF-16 Unicode file. Hence, the file had to be converted to a format which ‘grep’ could parse. The iconv program helps us perform this file format (character encoding) conversion.

Solution:

Change the file’s character encoding from UTF-16 to UTF-8 and then perform the grep as follows:

$> iconv -f UTF-16 -t UTF-8 ERRORLOG.1 | grep "SQL Server has encountered.*I/O requests"

Root Cause:

The grep program cannot parse files with certain character encodings like UTF-16.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.9.22_1171]
Rating: +4 (from 4 votes)
Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *