The POSIX standard used to say that all string comparisons are performed based on the locale’s collating order. This is the order in which characters sort, as defined by the locale (for more discussion, see Where You Are Makes a Difference). This order is usually very different from the results obtained when doing straight byte-by-byte comparison.38
Because this behavior differs considerably from existing practice,
gawk
only implemented it when in POSIX mode (see Command-Line Options).
Here is an example to illustrate the difference, in an en_US.UTF-8
locale:
$ gawk 'BEGIN { printf("ABC < abc = %s\n", > ("ABC" < "abc" ? "TRUE" : "FALSE")) }' -| ABC < abc = TRUE $ gawk --posix 'BEGIN { printf("ABC < abc = %s\n", > ("ABC" < "abc" ? "TRUE" : "FALSE")) }' -| ABC < abc = FALSE
Fortunately, as of August 2016, comparison based on locale
collating order is no longer required for the ==
and !=
operators.39 However, comparison based on locales is still
required for <
, <=
, >
, and >=
. POSIX thus
recommends as follows:
Since the
==
operator checks whether strings are identical, not whether they collate equally, applications needing to check whether strings collate equally can use:a <= b && a >= b
As of version 4.2, gawk
continues to use locale
collating order for <
, <=
, >
, and >=
only
in POSIX mode.
Technically, string comparison is supposed to behave
the same way as if the strings were compared with the C strcoll()
function.