TOC 
Network Working GroupS. Josefsson
Internet-DraftNovember 5, 2003
Expires: May 5, 2004 

Nameprep and IDNA Test Vectors
draft-josefsson-idn-test-vectors

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on May 5, 2004.

Copyright Notice

Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

This document contains test vectors for Nameprep and IDNA.



 TOC 

Table of Contents




 TOC 

1. Introduction

The Nameprep and IDNA specifications lack thorough examples that would have aided in implementing them. This document act as a complement to those specifications providing such examples.

It should be pointed out that this document is not normative, and thus any errors in this document should not be treated as gospel that defines Nameprep nor IDNA. When conforming to the specification and generating output corresponding to values in this document is in conflict, implementations should conform to the specification.



 TOC 

2. Format of Nameprep Test Vectors

The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character.

# First the (UTF-8) string is printed as a C octet string, with
# characters [A-Za-z .0-9] shown inline and other characters shown
# escaped with \xAB where AB is the hex sequence of that octet.  The
# number of octets are also shown.

   in (length 3 bytes):
   	\xE1\xBE\xB7

# The input is also printed as Unicode codepoints.

   input (length 1):
   	U+1fb7

# After printing the input, the nameprep steps starts.  When the
# string is modified, the specific operation that caused it is printed
# along with the new string of Unicode code points.

# 1) Map -- For each character in the input, check if it has a mapping
#    and, if so, replace it with its mapping.  This is described in
#    section 3.

   Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9.
   U+03b1 U+0342 U+03b9

# 2) Normalize -- Possibly normalize the result of step 1 using Unicode
#    normalization.  This is described in section 4.

   Unicode normalization with form KC maps string into:
   U+1fb6 U+03b9

# 3) Prohibit -- Check for any characters that are not allowed in the
#    output.  If any are found, return an error.  This is described in
#    section 5.

# 4) Check bidi -- Possibly check for right-to-left characters, and if
#    any are found, make sure that the whole string satisfies the
#    requirements for bidirectional strings.  If the string does not
#    satisfy the requirements for bidirectional strings, return an
#    error.  This is described in section 6.
#
#    1) The characters in section 5.8 MUST be prohibited.

#    2) If a string contains any RandALCat character, the string MUST NOT
#       contain any LCat character.

#    3) If a string contains any RandALCat character, a RandALCat
#       character MUST be the first character of the string, and a
#       RandALCat character MUST be the last character of the string.

# The output is printed as Unicode codepoints.

   output (length 2):
   	U+1fb6 U+03b9

# And finally the output is printed as UTF-8

   out (length 5 bytes):
   	\xE1\xBE\xB6\xCE\xB9


 TOC 

3. Format of IDNA Test Vectors

The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character.

# First the (UTF-8) string is printed as a C octet string, with
# characters [A-Za-z .0-9] shown inline and other characters shown
# escaped with \xAB where AB is the hex sequence of that octet.  The
# number of octets are also shown.

   in (length 39 bytes):
   	'Hello\x2DAnother\x2DWa'
   	'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81'
   	'\xAE\xE5\xA0\xB4\xE6\x89\x80

# The input is also printed as Unicode codepoints.

   input (length 39):
   	U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e
   	U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061
   	U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834
   	U+6240

# After printing the input, the IDNA ToASCII step starts.  The output
# is printed as an ASCII string.

   out: xn--hello-another-way--fc4qua05auwb3674vfr0b



 TOC 

4. Nameprep Test Vectors



 TOC 

5. IDNA ToASCII Test Vectors

5.1 Arabic (Egyptian)

in (length 34 bytes):
	'\xD9\x84\xD9\x8A\xD9\x87\xD9\x85\xD8\xA7\xD8\xA8\xD8\xAA\xD9\x83'
	'\xD9\x84\xD9\x85\xD9\x88\xD8\xB4\xD8\xB9\xD8\xB1\xD8\xA8\xD9\x8A'
	'\xD8\x9F
input (length 34):
	U+0644 U+064a U+0647 U+0645 U+0627 U+0628 U+062a U+0643 
	U+0644 U+0645 U+0648 U+0634 U+0639 U+0631 U+0628 U+064a 
	U+061f 

out: xn--egbpdaj6bu4bxfgehfvwxn

5.2 Chinese (simplified)

in (length 27 bytes):
	'\xE4\xBB\x96\xE4\xBB\xAC\xE4\xB8\xBA\xE4\xBB\x80\xE4\xB9\x88\xE4'
	'\xB8\x8D\xE8\xAF\xB4\xE4\xB8\xAD\xE6\x96\x87
input (length 27):
	U+4ed6 U+4eec U+4e3a U+4ec0 U+4e48 U+4e0d U+8bf4 U+4e2d 
	U+6587 

out: xn--ihqwcrb4cv8a8dqg056pqjye

5.3 Chinese (traditional)

in (length 27 bytes):
	'\xE4\xBB\x96\xE5\x80\x91\xE7\x88\xB2\xE4\xBB\x80\xE9\xBA\xBD\xE4'
	'\xB8\x8D\xE8\xAA\xAA\xE4\xB8\xAD\xE6\x96\x87
input (length 27):
	U+4ed6 U+5011 U+7232 U+4ec0 U+9ebd U+4e0d U+8aaa U+4e2d 
	U+6587 

out: xn--ihqwctvzc91f659drss3x8bo0yb

5.4 Czech

in (length 26 bytes):
	'Pro\xC4\x8Dprost\xC4\x9Bneml'
	'uv\xC3\xAD\xC4\x8Desky
input (length 26):
	U+0050 U+0072 U+006f U+010d U+0070 U+0072 U+006f U+0073 
	U+0074 U+011b U+006e U+0065 U+006d U+006c U+0075 U+0076 
	U+00ed U+010d U+0065 U+0073 U+006b U+0079 

out: xn--proprostnemluvesky-uyb24dma41a

5.5 Hebrew

in (length 44 bytes):
	'\xD7\x9C\xD7\x9E\xD7\x94\xD7\x94\xD7\x9D\xD7\xA4\xD7\xA9\xD7\x95'
	'\xD7\x98\xD7\x9C\xD7\x90\xD7\x9E\xD7\x93\xD7\x91\xD7\xA8\xD7\x99'
	'\xD7\x9D\xD7\xA2\xD7\x91\xD7\xA8\xD7\x99\xD7\xAA
input (length 44):
	U+05dc U+05de U+05d4 U+05d4 U+05dd U+05e4 U+05e9 U+05d5 
	U+05d8 U+05dc U+05d0 U+05de U+05d3 U+05d1 U+05e8 U+05d9 
	U+05dd U+05e2 U+05d1 U+05e8 U+05d9 U+05ea 

out: xn--4dbcagdahymbxekheh6e0a7fei0b

5.6 Hindi (Devanagari)

in (length 90 bytes):
	'\xE0\xA4\xAF\xE0\xA4\xB9\xE0\xA4\xB2\xE0\xA5\x8B\xE0\xA4\x97\xE0'
	'\xA4\xB9\xE0\xA4\xBF\xE0\xA4\xA8\xE0\xA5\x8D\xE0\xA4\xA6\xE0\xA5'
	'\x80\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA4\xAF\xE0\xA5\x8B\xE0\xA4\x82'
	'\xE0\xA4\xA8\xE0\xA4\xB9\xE0\xA5\x80\xE0\xA4\x82\xE0\xA4\xAC\xE0'
	'\xA5\x8B\xE0\xA4\xB2\xE0\xA4\xB8\xE0\xA4\x95\xE0\xA4\xA4\xE0\xA5'
	'\x87\xE0\xA4\xB9\xE0\xA5\x88\xE0\xA4\x82
input (length 90):
	U+092f U+0939 U+0932 U+094b U+0917 U+0939 U+093f U+0928 
	U+094d U+0926 U+0940 U+0915 U+094d U+092f U+094b U+0902 
	U+0928 U+0939 U+0940 U+0902 U+092c U+094b U+0932 U+0938 
	U+0915 U+0924 U+0947 U+0939 U+0948 U+0902 

out: xn--i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd

5.7 Japanese (kanji and hiragana)

in (length 54 bytes):
	'\xE3\x81\xAA\xE3\x81\x9C\xE3\x81\xBF\xE3\x82\x93\xE3\x81\xAA\xE6'
	'\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E\xE3\x82\x92\xE8\xA9\xB1\xE3\x81'
	'\x97\xE3\x81\xA6\xE3\x81\x8F\xE3\x82\x8C\xE3\x81\xAA\xE3\x81\x84'
	'\xE3\x81\xAE\xE3\x81\x8B
input (length 54):
	U+306a U+305c U+307f U+3093 U+306a U+65e5 U+672c U+8a9e 
	U+3092 U+8a71 U+3057 U+3066 U+304f U+308c U+306a U+3044 
	U+306e U+304b 

out: xn--n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa

5.8 Russian (Cyrillic)

in (length 56 bytes):
	'\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5'
	'\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2'
	'\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83'
	'\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8
input (length 56):
	U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 
	U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 
	U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 
	U+0441 U+0441 U+043a U+0438 

out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l

5.9 Spanish

in (length 42 bytes):
	'Porqu\xC3\xA9nopuedens'
	'implementehablar'
	'enEspa\xC3\xB1ol
input (length 42):
	U+0050 U+006f U+0072 U+0071 U+0075 U+00e9 U+006e U+006f 
	U+0070 U+0075 U+0065 U+0064 U+0065 U+006e U+0073 U+0069 
	U+006d U+0070 U+006c U+0065 U+006d U+0065 U+006e U+0074 
	U+0065 U+0068 U+0061 U+0062 U+006c U+0061 U+0072 U+0065 
	U+006e U+0045 U+0073 U+0070 U+0061 U+00f1 U+006f U+006c 
	

out: xn--porqunopuedensimplementehablarenespaol-fmd56a

5.10 Vietnamese

in (length 45 bytes):
	'T\xE1\xBA\xA1isaoh\xE1\xBB\x8Dkh\xC3\xB4'
	'ngth\xE1\xBB\x83ch\xE1\xBB\x89n\xC3\xB3i'
	'ti\xE1\xBA\xBFngVi\xE1\xBB\x87t
input (length 45):
	U+0054 U+1ea1 U+0069 U+0073 U+0061 U+006f U+0068 U+1ecd 
	U+006b U+0068 U+00f4 U+006e U+0067 U+0074 U+0068 U+1ec3 
	U+0063 U+0068 U+1ec9 U+006e U+00f3 U+0069 U+0074 U+0069 
	U+1ebf U+006e U+0067 U+0056 U+0069 U+1ec7 U+0074 

out: xn--tisaohkhngthchnitingvit-kjcr8268qyxafd2f1b9g

5.11 Japanese

in (length 20 bytes):
	'3\xE5\xB9\xB4B\xE7\xB5\x84\xE9\x87\x91\xE5\x85\xAB\xE5\x85'
	'\x88\xE7\x94\x9F
input (length 20):
	U+0033 U+5e74 U+0042 U+7d44 U+91d1 U+516b U+5148 U+751f 
	

out: xn--3b-ww4c5e180e575a65lsy2b

5.12 Japanese

in (length 34 bytes):
	'\xE5\xAE\x89\xE5\xAE\xA4\xE5\xA5\x88\xE7\xBE\x8E\xE6\x81\xB5\x2D'
	'with\x2DSUPER\x2DMONKE'
	'YS
input (length 34):
	U+5b89 U+5ba4 U+5948 U+7f8e U+6075 U+002d U+0077 U+0069 
	U+0074 U+0068 U+002d U+0053 U+0055 U+0050 U+0045 U+0052 
	U+002d U+004d U+004f U+004e U+004b U+0045 U+0059 U+0053 
	

out: xn---with-super-monkeys-pc58ag80a8qai00g7n9n

5.13 Japanese

in (length 39 bytes):
	'Hello\x2DAnother\x2DWa'
	'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81'
	'\xAE\xE5\xA0\xB4\xE6\x89\x80
input (length 39):
	U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e 
	U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061 
	U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834 
	U+6240 

out: xn--hello-another-way--fc4qua05auwb3674vfr0b

5.14 Japanese

in (length 22 bytes):
	'\xE3\x81\xB2\xE3\x81\xA8\xE3\x81\xA4\xE5\xB1\x8B\xE6\xA0\xB9\xE3'
	'\x81\xAE\xE4\xB8\x8B2
input (length 22):
	U+3072 U+3068 U+3064 U+5c4b U+6839 U+306e U+4e0b U+0032 
	

out: xn--2-u9tlzr9756bt3uc0v

5.15 Japanese

in (length 23 bytes):
	'Maji\xE3\x81\xA7Koi\xE3\x81\x99\xE3\x82\x8B'
	'5\xE7\xA7\x92\xE5\x89\x8D
input (length 23):
	U+004d U+0061 U+006a U+0069 U+3067 U+004b U+006f U+0069 
	U+3059 U+308b U+0035 U+79d2 U+524d 

out: xn--majikoi5-783gue6qz075azm5e

5.16 Japanese

in (length 23 bytes):
	'\xE3\x83\x91\xE3\x83\x95\xE3\x82\xA3\xE3\x83\xBCde\xE3\x83'
	'\xAB\xE3\x83\xB3\xE3\x83\x90
input (length 23):
	U+30d1 U+30d5 U+30a3 U+30fc U+0064 U+0065 U+30eb U+30f3 
	U+30d0 

out: xn--de-jg4avhby1noc0d

5.17 Japanese

in (length 21 bytes):
	'\xE3\x81\x9D\xE3\x81\xAE\xE3\x82\xB9\xE3\x83\x94\xE3\x83\xBC\xE3'
	'\x83\x89\xE3\x81\xA7
input (length 21):
	U+305d U+306e U+30b9 U+30d4 U+30fc U+30c9 U+3067 

out: xn--d9juau41awczczp

5.18 Greek

in (length 16 bytes):
	'\xCE\xB5\xCE\xBB\xCE\xBB\xCE\xB7\xCE\xBD\xCE\xB9\xCE\xBA\xCE\xAC
input (length 16):
	U+03b5 U+03bb U+03bb U+03b7 U+03bd U+03b9 U+03ba U+03ac 
	

out: xn--hxargifdar

5.19 Maltese (Malti)

in (length 13 bytes):
	'bon\xC4\xA1usa\xC4\xA7\xC4\xA7a
input (length 13):
	U+0062 U+006f U+006e U+0121 U+0075 U+0073 U+0061 U+0127 
	U+0127 U+0061 

out: xn--bonusaa-5bb1da

5.20 Russian (Cyrillic)

in (length 56 bytes):
	'\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5'
	'\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2'
	'\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83'
	'\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8
input (length 56):
	U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 
	U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 
	U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 
	U+0441 U+0441 U+043a U+0438 

out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l


 TOC 

6. IDNA ToUnicode Test Vectors



 TOC 

7. Auxiliary Test Vectors

These test vectors do not test Nameprep nor IDNA proper, rather they test the UTF-8 handling of software. Instead of outputting the indicated Unicode code point, they should raise an error that the input was invalid.

7.1 Incorrect UTF-8 encoding of U+00DF

in (length 2 bytes):
	\xC3\xDF
output (length 1):
	U+00df 

7.2 Incorrect UTF-8 encoding of U+01F0

in (length 2 bytes):
	\xC7\xF0
output (length 1):
	U+01f0 


 TOC 

8. Security Considerations

The security considerations from Nameprep and IDNA are inherited.

These test vectors are not believed to introduce new security considerations nor disrupt the operation of the Internet, but may expose security weaknesses in existing implementations. Any such incident should not be regarded as a problem with this document, though, but rather taken as evidence that this document served its purpose.



 TOC 

Normative References

[1] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[2] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.


 TOC 

Informative References

[3] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003.


 TOC 

Author's Address

  Simon Josefsson
EMail:  simon@josefsson.org

Acknowledgments

Some IDNA test vectors were borrowed from Punycode [3]. Several people pointed out invalid UTF-8 encodings used in earlier versions of the draft.



 TOC 

Appendix A. Nameprep test vectors in C syntax

In order to avoid having implementors type in the test vectors above, a C structure with the data is provided.

The comment field is the section titles used in this document. The in field contains UTF-8 encoded strings. The out field contains expected output, or NULL if the expected result is an error. The profile field can be ignored. The only significant setting for the flags field is STRINGPREP_NO_UNASSIGNED which signals to the Nameprep implementation that it should perform unassigned code point checking, aka the "AllowUnassigned" flag. The rc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory.

struct stringprep
{
  char *comment;
  char *in;
  char *out;
  char *profile;
  int flags;
  int rc;
}
strprep[] =
{
  {
    "Map to nothing",
    "foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8B"
    "bar""\xE2\x80\x8B\xE2\x81\xA0""baz\xEF\xB8\x80\xEF\xB8\x88"
    "\xEF\xB8\x8F\xEF\xBB\xBF", "foobarbaz"
  },
  {
    "Case folding ASCII U+0043 U+0041 U+0046 U+0045",
    "CAFE", "cafe"
  },
  {
    "Case folding 8bit U+00DF (german sharp s)",
    "\xC3\x9F", "ss"
  },
  {
    "Case folding U+0130 (turkish capital I with dot)",
    "\xC4\xB0", "i\xcc\x87"
  },
  {
    "Case folding multibyte U+0143 U+037A",
    "\xC5\x83\xCD\xBA", "\xC5\x84 \xCE\xB9"
  },
  {
    "Case folding U+2121 U+33C6 U+1D7BB",
    "\xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB",
    "telc\xE2\x88\x95""kg\xCF\x83"
  },
  {
    "Normalization of U+006a U+030c U+00A0 U+00AA",
    "\x6A\xCC\x8C\xC2\xA0\xC2\xAA", "\xC7\xB0 a"
  },
  {
    "Case folding U+1FB7 and normalization",
    "\xE1\xBE\xB7", "\xE1\xBE\xB6\xCE\xB9"
  },
  {
    "Self-reverting case folding U+01F0 and normalization",
    "\xC7\xF0", "\xC7\xB0"
  },
  {
    "Self-reverting case folding U+0390 and normalization",
    "\xCE\x90", "\xCE\x90"
  },
  {
    "Self-reverting case folding U+03B0 and normalization",
    "\xCE\xB0", "\xCE\xB0"
  },
  {
    "Self-reverting case folding U+1E96 and normalization",
    "\xE1\xBA\x96", "\xE1\xBA\x96"
  },
  {
    "Self-reverting case folding U+1F56 and normalization",
    "\xE1\xBD\x96", "\xE1\xBD\x96"
  },
  {
    "ASCII space character U+0020",
    "\x20", "\x20"
  },
  {
    "Non-ASCII 8bit space character U+00A0",
    "\xC2\xA0", "\x20"
  },
  {
    "Non-ASCII multibyte space character U+1680",
    "\xE1\x9A\x80", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Non-ASCII multibyte space character U+2000",
    "\xE2\x80\x80", "\x20"
  },
  {
    "Zero Width Space U+200b",
    "\xE2\x80\x8b", ""
  },
  {
    "Non-ASCII multibyte space character U+3000",
    "\xE3\x80\x80", "\x20"
  },
  {
    "ASCII control characters U+0010 U+007F",
    "\x10\x7F", "\x10\x7F"
  },
  {
    "Non-ASCII 8bit control character U+0085",
    "\xC2\x85", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Non-ASCII multibyte control character U+180E",
    "\xE1\xA0\x8E", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Zero Width No-Break Space U+FEFF",
    "\xEF\xBB\xBF", ""
  },
  {
    "Non-ASCII control character U+1D175",
    "\xF0\x9D\x85\xB5", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Plane 0 private use character U+F123",
    "\xEF\x84\xA3", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Plane 15 private use character U+F1234",
    "\xF3\xB1\x88\xB4", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Plane 16 private use character U+10F234",
    "\xF4\x8F\x88\xB4", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Non-character code point U+8FFFE",
    "\xF2\x8F\xBF\xBE", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Non-character code point U+10FFFF",
    "\xF4\x8F\xBF\xBF", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Surrogate code U+DF42",
    "\xED\xBD\x82", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Non-plain text character U+FFFD",
    "\xEF\xBF\xBD", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Ideographic description character U+2FF5",
    "\xE2\xBF\xB5", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Display property character U+0341",
    "\xCD\x81", "\xCC\x81"
  },
  {
    "Left-to-right mark U+200E",
    "\xE2\x80\x8E", "\xCC\x81", "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Deprecated U+202A",
    "\xE2\x80\xAA", "\xCC\x81", "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Language tagging character U+E0001",
    "\xF3\xA0\x80\x81", "\xCC\x81", "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Language tagging character U+E0042",
    "\xF3\xA0\x81\x82", NULL, "Nameprep", 0,
    STRINGPREP_CONTAINS_PROHIBITED
  },
  {
    "Bidi: RandALCat character U+05BE and LCat characters",
    "foo\xD6\xBE""bar", NULL, "Nameprep", 0,
    STRINGPREP_BIDI_BOTH_L_AND_RAL
  },
  {
    "Bidi: RandALCat character U+FD50 and LCat characters",
    "foo\xEF\xB5\x90""bar", NULL, "Nameprep", 0,
    STRINGPREP_BIDI_BOTH_L_AND_RAL
  },
  {
    "Bidi: RandALCat character U+FB38 and LCat characters",
    "foo\xEF\xB9\xB6""bar", "foo \xd9\x8e""bar"
  },
  { "Bidi: RandALCat without trailing RandALCat U+0627 U+0031",
    "\xD8\xA7\x31", NULL, "Nameprep", 0,
    STRINGPREP_BIDI_LEADTRAIL_NOT_RAL}
  ,
  {
    "Bidi: RandALCat character U+0627 U+0031 U+0628",
    "\xD8\xA7\x31\xD8\xA8", "\xD8\xA7\x31\xD8\xA8"
  },
  {
    "Unassigned code point U+E0002",
    "\xF3\xA0\x80\x82", NULL, "Nameprep", STRINGPREP_NO_UNASSIGNED,
    STRINGPREP_CONTAINS_UNASSIGNED
  },
  {
    "Larger test (shrinking)",
    "X\xC2\xAD\xC3\x9F\xC4\xB0\xE2\x84\xA1\x6a\xcc\x8c\xc2\xa0\xc2"
    "\xaa\xce\xb0\xe2\x80\x80", "xssi\xcc\x87""tel\xc7\xb0 a\xce\xb0 ",
    "Nameprep"
  },
  {
    "Larger test (expanding)",
    "X\xC3\x9F\xe3\x8c\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8c\x80",
    "xss\xe3\x82\xad\xe3\x83\xad\xe3\x83\xa1\xe3\x83\xbc\xe3\x83\x88"
    "\xe3\x83\xab""i\xcc\x87""tel\x28""d\x29\xe3\x82\xa2\xe3\x83\x91"
    "\xe3\x83\xbc\xe3\x83\x88"
  },
};


 TOC 

Appendix B. IDNA test vectors in C syntax

In order to avoid having implementors type in the IDNA test vectors above, a C structure with the data is provided.

The name field is the section titles used in this document. The inlen and in field contains Unicode code points. The out field contains expected ToASCII output. The allowunassigned, and usestd3asciirules can be ignored. The toasciirc and tounicoderc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory.

struct idna
{
  char *name;
  size_t inlen;
  unsigned long in[100];
  char *out;
  int allowunassigned;
  int usestd3asciirules;
  int toasciirc;
  int tounicoderc;
} idna[] =
{
  {
    "Arabic (Egyptian)", 17,
    {
  0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643,
	0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A,
	0x061F},
      IDNA_ACE_PREFIX "egbpdaj6bu4bxfgehfvwxn", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Chinese (simplified)", 9,
    {
  0x4ED6, 0x4EEC, 0x4E3A, 0x4EC0, 0x4E48, 0x4E0D, 0x8BF4, 0x4E2D, 0x6587},
      IDNA_ACE_PREFIX "ihqwcrb4cv8a8dqg056pqjye", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Chinese (traditional)", 9,
    {
  0x4ED6, 0x5011, 0x7232, 0x4EC0, 0x9EBD, 0x4E0D, 0x8AAA, 0x4E2D, 0x6587},
      IDNA_ACE_PREFIX "ihqwctvzc91f659drss3x8bo0yb", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Czech", 22,
    {
  0x0050, 0x0072, 0x006F, 0x010D, 0x0070, 0x0072, 0x006F, 0x0073,
	0x0074, 0x011B, 0x006E, 0x0065, 0x006D, 0x006C, 0x0075, 0x0076,
	0x00ED, 0x010D, 0x0065, 0x0073, 0x006B, 0x0079},
      IDNA_ACE_PREFIX "Proprostnemluvesky-uyb24dma41a", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Hebrew", 22,
    {
  0x05DC, 0x05DE, 0x05D4, 0x05D4, 0x05DD, 0x05E4, 0x05E9, 0x05D5,
	0x05D8, 0x05DC, 0x05D0, 0x05DE, 0x05D3, 0x05D1, 0x05E8, 0x05D9,
	0x05DD, 0x05E2, 0x05D1, 0x05E8, 0x05D9, 0x05EA},
      IDNA_ACE_PREFIX "4dbcagdahymbxekheh6e0a7fei0b", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Hindi (Devanagari)", 30,
    {
  0x092F, 0x0939, 0x0932, 0x094B, 0x0917, 0x0939, 0x093F, 0x0928,
	0x094D, 0x0926, 0x0940, 0x0915, 0x094D, 0x092F, 0x094B, 0x0902,
	0x0928, 0x0939, 0x0940, 0x0902, 0x092C, 0x094B, 0x0932, 0x0938,
	0x0915, 0x0924, 0x0947, 0x0939, 0x0948, 0x0902},
      IDNA_ACE_PREFIX "i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd", 0, 0,
      IDNA_SUCCESS},
  {
    "Japanese (kanji and hiragana)", 18,
    {
  0x306A, 0x305C, 0x307F, 0x3093, 0x306A, 0x65E5, 0x672C, 0x8A9E,
	0x3092, 0x8A71, 0x3057, 0x3066, 0x304F, 0x308C, 0x306A, 0x3044,
	0x306E, 0x304B},
      IDNA_ACE_PREFIX "n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa", 0, 0,
      IDNA_SUCCESS},
  {
    "Russian (Cyrillic)", 28,
    {
  0x043F, 0x043E, 0x0447, 0x0435, 0x043C, 0x0443, 0x0436, 0x0435,
	0x043E, 0x043D, 0x0438, 0x043D, 0x0435, 0x0433, 0x043E, 0x0432,
	0x043E, 0x0440, 0x044F, 0x0442, 0x043F, 0x043E, 0x0440, 0x0443,
	0x0441, 0x0441, 0x043A, 0x0438},
      IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
      IDNA_SUCCESS, IDNA_SUCCESS},
  {
    "Spanish", 40,
    {
  0x0050, 0x006F, 0x0072, 0x0071, 0x0075, 0x00E9, 0x006E, 0x006F,
	0x0070, 0x0075, 0x0065, 0x0064, 0x0065, 0x006E, 0x0073, 0x0069,
	0x006D, 0x0070, 0x006C, 0x0065, 0x006D, 0x0065, 0x006E, 0x0074,
	0x0065, 0x0068, 0x0061, 0x0062, 0x006C, 0x0061, 0x0072, 0x0065,
	0x006E, 0x0045, 0x0073, 0x0070, 0x0061, 0x00F1, 0x006F, 0x006C},
      IDNA_ACE_PREFIX "PorqunopuedensimplementehablarenEspaol-fmd56a", 0, 0,
      IDNA_SUCCESS},
  {
    "Vietnamese", 31,
    {
  0x0054, 0x1EA1, 0x0069, 0x0073, 0x0061, 0x006F, 0x0068, 0x1ECD,
	0x006B, 0x0068, 0x00F4, 0x006E, 0x0067, 0x0074, 0x0068, 0x1EC3,
	0x0063, 0x0068, 0x1EC9, 0x006E, 0x00F3, 0x0069, 0x0074, 0x0069,
	0x1EBF, 0x006E, 0x0067, 0x0056, 0x0069, 0x1EC7, 0x0074},
      IDNA_ACE_PREFIX "TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g", 0, 0,
      IDNA_SUCCESS},
  {
    "Japanese", 8,
    {
  0x0033, 0x5E74, 0x0042, 0x7D44, 0x91D1, 0x516B, 0x5148, 0x751F},
      IDNA_ACE_PREFIX "3B-ww4c5e180e575a65lsy2b", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Japanese", 24,
    {
  0x5B89, 0x5BA4, 0x5948, 0x7F8E, 0x6075, 0x002D, 0x0077, 0x0069,
	0x0074, 0x0068, 0x002D, 0x0053, 0x0055, 0x0050, 0x0045, 0x0052,
	0x002D, 0x004D, 0x004F, 0x004E, 0x004B, 0x0045, 0x0059, 0x0053},
      IDNA_ACE_PREFIX "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n", 0, 0,
      IDNA_SUCCESS},
  {
    "Japanese", 25,
    {
  0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002D, 0x0041, 0x006E,
	0x006F, 0x0074, 0x0068, 0x0065, 0x0072, 0x002D, 0x0057, 0x0061,
	0x0079, 0x002D, 0x305D, 0x308C, 0x305E, 0x308C, 0x306E, 0x5834,
	0x6240},
      IDNA_ACE_PREFIX "Hello-Another-Way--fc4qua05auwb3674vfr0b", 0, 0,
      IDNA_SUCCESS},
  {
    "Japanese", 8,
    {
  0x3072, 0x3068, 0x3064, 0x5C4B, 0x6839, 0x306E, 0x4E0B, 0x0032},
      IDNA_ACE_PREFIX "2-u9tlzr9756bt3uc0v", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Japanese", 13,
    {
  0x004D, 0x0061, 0x006A, 0x0069, 0x3067, 0x004B, 0x006F, 0x0069,
	0x3059, 0x308B, 0x0035, 0x79D2, 0x524D},
      IDNA_ACE_PREFIX "MajiKoi5-783gue6qz075azm5e", 0, 0, IDNA_SUCCESS,
      IDNA_SUCCESS},
  {
    "Japanese", 9,
    {
  0x30D1, 0x30D5, 0x30A3, 0x30FC, 0x0064, 0x0065, 0x30EB, 0x30F3, 0x30D0},
      IDNA_ACE_PREFIX "de-jg4avhby1noc0d", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
  {
    "Japanese", 7,
    {
  0x305D, 0x306E, 0x30B9, 0x30D4, 0x30FC, 0x30C9, 0x3067},
      IDNA_ACE_PREFIX "d9juau41awczczp", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
  {
    "Greek", 8,
    {0x03b5, 0x03bb, 0x03bb, 0x03b7, 0x03bd, 0x03b9, 0x03ba, 0x03ac},
    IDNA_ACE_PREFIX "hxargifdar", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
  {
    "Maltese (Malti)", 10,
    {0x0062, 0x006f, 0x006e, 0x0121, 0x0075, 0x0073, 0x0061, 0x0127,
     0x0127, 0x0061},
    IDNA_ACE_PREFIX "bonusaa-5bb1da", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS},
  {
    "Russian (Cyrillic)", 28,
    {0x043f, 0x043e, 0x0447, 0x0435, 0x043c, 0x0443, 0x0436, 0x0435,
     0x043e, 0x043d, 0x0438, 0x043d, 0x0435, 0x0433, 0x043e, 0x0432,
     0x043e, 0x0440, 0x044f, 0x0442, 0x043f, 0x043e, 0x0440, 0x0443,
     0x0441, 0x0441, 0x043a, 0x0438},
    IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0,
    IDNA_SUCCESS, IDNA_SUCCESS},
};


 TOC 

Intellectual Property Statement

Full Copyright Statement

Acknowledgment