normalize (string)
  • 23 May 2023
  • Dark
    Light
  • PDF

normalize (string)

  • Dark
    Light
  • PDF

Article summary

normalize (string) Task Purpose

The normalize task is used to normalize a string. This task method returns a string whose binary representation is in a particular Unicode normalization form, which can be one of the following:

Value Definition
NFC Canonical Decomposition, followed by Canonical Composition
NFD Canonical Decomposition
NFKC Compatibility Decomposition, followed by Canonical Composition
NFKD Compatible Decomposition

For additional information, please refer to the Unicode Normalization Forms standard.

In simple terms, normalization ensures two strings that may use a different binary representation for their characters have the same binary value after normalization.

Potential Use Cases

Unicode sometimes has multiple representations of the same character. For example, the letter "e" with the accute accent (é) can be represented in Unicode using either U+00E9 (single code point), or U+0065 and U+0301 together (two code points). This can cause unexpected errors, such as password mismatching that prevents user authentication or the inability to search and sort email addresses in a database. To ensure data is stored and accessed in a consistent manner, use normalize whenever you need to convert characters with diacritical marks, change letter case, decompose ligatures, or convert half-width characters to full-width characters and so on. In short, you should always normalize and maintain consistent representation of characters whenever you're accepting input from users.

Properties

Input and output properties are shown below.

Input Type Description
str String Required. The string to normalize.
form String Optional. One of the specified forms for Unicode Normalization: NFC, NFD, NFKC, or NFKD.


Output Type Description
normalizedString String A string containing the Unicode Normalization Form of the given string.

Example

In this example, the incoming str variable is "Chloé O'Leary" and the form value is NFC. Note the use of an accute accent in the first name and an apostrophe in the last name.

normalize-01

The task creates a normalizedString upon output.

normalize-02


Was this article helpful?

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.