Unix / Linux: Remove duplicate lines from a text file using awk or perl

Unix / Linux: Remove duplicate lines from a text file using awk or perl

To remove duplicate lines from a text file using awk, you can use the following command:

r‮ refe‬to:lautturi.com
awk '!seen[$0]++' input_file > output_file

This will read the input file line by line and store each line in the seen array. If the line has already been stored in the array, the value of seen[$0] will be greater than 1, and the line will be skipped. If the line has not been seen before, it will be printed to the output file.

Alternatively, you can use the perl command to remove duplicate lines from a text file using the following command:

perl -ne 'print if ! $seen{$_}++' input_file > output_file

This command will read the input file line by line and store each line in the %seen hash. If the line has already been stored in the hash, the value of $seen{$_} will be greater than 1, and the line will be skipped. If the line has not been seen before, it will be printed to the output file.

Both of these commands will remove duplicate lines from the input file and write the resulting, deduplicated output to a new file. If you want to modify the input file in place, you can use the -i option with perl, like this:

perl -i -ne 'print if ! $seen{$_}++' input_file

This will modify the input_file directly, removing any duplicate lines.

Created Time:2017-10-30 14:27:30  Author:lautturi