Cannot grep UTF-16 Unicode files
Problem:
Microsoft SQL Server error logs had I/O errors in them. After transferring 4 months’ logs over to a UNIX machine to analyze them with grep/awk/sed, the grep command did not return any output when searching for strings which were present as indicated when viewing the file using the vi editor.
Background & Analysis:
On the UNIX host, I checked the file type of the SQL Server error log as follows:
$> file ERRORLOG.1
ERRORLOG.1: Little-endian UTF-16 Unicode English character data, with very long lines,
with CRLF line terminators
So, the grep command couldn’t parse the UTF-16 Unicode file. Hence, the file had to be converted to a format which ‘grep’ could parse. The iconv program helps us perform this file format (character encoding) conversion.
Solution:
Change the file’s character encoding from UTF-16 to UTF-8 and then perform the grep as follows:
$> iconv -f UTF-16 -t UTF-8 ERRORLOG.1 | grep "SQL Server has encountered.*I/O requests"
Root Cause:
The grep program cannot parse files with certain character encodings like UTF-16.
NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.
Tagged with: cannot grep UTF-16 • grep does not return output • grep not working • iconv • iconv -f UTF-16 -t UTF-8 • Little-endian UTF-16 Unicode English character data
Filed under: Applications • IT Support
Like this post? Subscribe to my RSS feed and get loads more!
Leave a Reply