BASH - Commands - sort - Numeric Sort Bug
A test file:
- test.txt
192.168.1.123.25:12345 10.0.0.1:80 192.168.1.123.125:12345 10.0.0.1:8080
sort -n test.txt | uniq
returns:
10.0.0.1:80 10.0.0.1:8080 192.168.1.123.125:12345 192.168.1.123.25:12345
sort -un test.txt
returns:
10.0.0.1:80 192.168.1.123.25:12345
NOTE: The info page for sort does explain this behavior, though the man has no mention of it:
Numeric sort uses what might be considered an unconventional method to compare strings representing floating point numbers.
Rather than first converting each string to the C `double' type and then comparing those values, `sort' aligns the decimal-point characters in the two strings and compares the strings a character at a time.
One benefit of using this approach is its speed. In practice this is much more efficient than performing the two corresponding string-to-double (or even string-to-integer) conversions and then comparing doubles.
In addition, there is no corresponding loss of precision.
Converting each string to `double' before comparison would limit precision to about 16 digits on most systems.
Use sort to correctly sort IP addresses
sort -t . -k1,1n -k2,2n -k3,3n -k4,4n test.txt
returns:
10.0.0.1:80 10.0.0.1:8080 192.168.1.123.125:12345 192.168.1.123.25:12345