User Tools

Site Tools


bash:commands:sort:numeric_sort_bug

BASH - Commands - sort - Numeric Sort Bug

A test file:

test.txt
192.168.1.123.25:12345
10.0.0.1:80
192.168.1.123.125:12345
10.0.0.1:8080

sort -n test.txt | uniq

returns:

10.0.0.1:80
10.0.0.1:8080
192.168.1.123.125:12345
192.168.1.123.25:12345

sort -un test.txt

returns:

10.0.0.1:80
192.168.1.123.25:12345

NOTE: The info page for sort does explain this behavior, though the man has no mention of it:

Numeric sort uses what might be considered an unconventional method to compare strings representing floating point numbers.

Rather than first converting each string to the C `double' type and then comparing those values, `sort' aligns the decimal-point characters in the two strings and compares the strings a character at a time.

One benefit of using this approach is its speed. In practice this is much more efficient than performing the two corresponding string-to-double (or even string-to-integer) conversions and then comparing doubles.

In addition, there is no corresponding loss of precision.

Converting each string to `double' before comparison would limit precision to about 16 digits on most systems.


Use sort to correctly sort IP addresses

sort -t . -k1,1n -k2,2n -k3,3n -k4,4n test.txt

returns:

10.0.0.1:80
10.0.0.1:8080
192.168.1.123.125:12345
192.168.1.123.25:12345
bash/commands/sort/numeric_sort_bug.txt · Last modified: 2021/01/30 15:50 by peter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki