Linux / Unix: Unicode and HTML Characters Lookup By Name or Number

Linux / Unix: Unicode and HTML Characters Lookup By Name or Number

To look up Unicode and HTML characters by name or number on a Linux or Unix system, you can use the grep command to search through the Unicode character database.

The Unicode character database is a file that contains information about all Unicode characters, including their names and numbers. On most Linux and Unix systems, this file is located at /usr/share/unicode/UnicodeData.txt.

To search for a character by name, you can use the grep command with the -i option to ignore case and the -w option to match the entire word. For example, to search for the character "e acute", you can use the following command:

refer ‮ot‬:lautturi.com
grep -iw "e acute" /usr/share/unicode/UnicodeData.txt

This will output the line from the Unicode character database that contains the character "e acute", which looks something like this:

00E9;LATIN SMALL LETTER E WITH ACUTE;Ll;0;L;0065 0301;N;LATIN SMALL LETTER E ACUTE;;0065;LATIN SMALL LETTER E

The first field on the line is the character's number in hexadecimal format, the second field is the character's name, and the third and fifth fields are the character's category and combination class. The seventh field is the character's Unicode name, and the ninth field is the character's Unicode 1.0 name.

To search for a character by number, you can use the grep command with the -F option to treat the search pattern as a fixed string, and the -E option to enable extended regular expressions. For example, to search for the character with the number "00E9", you can use the following command:

grep -FE "^00E9" /usr/share/unicode/UnicodeData.txt

This will output the line from the Unicode character database.

Created Time:2017-10-29 22:08:59  Author:lautturi