- 23 May 2023
-
DarkLight
-
PDF
normalize (string)
- Updated on 23 May 2023
-
DarkLight
-
PDF
normalize (string) Task Purpose
The normalize task is used to normalize a string. This task method returns a string whose binary representation is in a particular Unicode normalization form, which can be one of the following:
Value | Definition |
---|---|
NFC | Canonical Decomposition, followed by Canonical Composition |
NFD | Canonical Decomposition |
NFKC | Compatibility Decomposition, followed by Canonical Composition |
NFKD | Compatible Decomposition |
For additional information, please refer to the Unicode Normalization Forms standard.
In simple terms, normalization ensures two strings that may use a different binary representation for their characters have the same binary value after normalization.
Potential Use Cases
Unicode sometimes has multiple representations of the same character. For example, the letter "e" with the accute accent (é) can be represented in Unicode using either U+00E9
(single code point), or U+0065
and U+0301
together (two code points). This can cause unexpected errors, such as password mismatching that prevents user authentication or the inability to search and sort email addresses in a database. To ensure data is stored and accessed in a consistent manner, use normalize whenever you need to convert characters with diacritical marks, change letter case, decompose ligatures, or convert half-width characters to full-width characters and so on. In short, you should always normalize and maintain consistent representation of characters whenever you're accepting input from users.
Properties
Input and output properties are shown below.
Input | Type | Description |
---|---|---|
str |
String | Required. The string to normalize. |
form |
String | Optional. One of the specified forms for Unicode Normalization: NFC, NFD, NFKC, or NFKD. |
Output | Type | Description |
---|---|---|
normalizedString |
String | A string containing the Unicode Normalization Form of the given string. |
Example
In this example, the incoming str
variable is "Chloé O'Leary" and the form
value is NFC
. Note the use of an accute accent in the first name and an apostrophe in the last name.
The task creates a normalizedString
upon output.