What is Canonization?
Canonization is the process of converting data that contains more than one representation into a standard-approved format. Such a conversion ensures that the data conforms to the canonical rules. This compares different representations to ensure equivalence, to count numbers from different data structures, to impose a meaningful sort order, and to improve the efficiency of the algorithm, thereby eliminating repeated computations.
Canonicalization is used in many Internet and computer applications to generate canonical data from non-canonical information. The canonical representation of data is often used in the
Search engine optimization (SEO), web servers, Unicode and XML used.
This term is also known as C14N, standardization or normalization.
In SEO, URL canonicalization deals with web content with more than one possible URL. This can lead to discrepancies in the search as the search engine may not know which URL to display. Canonicalization chooses the best URL from several options that are usually related to home pages. Although certain URLs appear to be the same, web servers return different results for the URLs. Search engines only look at a URL in canonical form.
Computer security is based on the canonicalization of file names. Some web servers may have a security rule to only run files in a specific directory. The file is then only executed if the path contains the specified directory in its name. Particular care must be taken to ensure that the file name is unique. One such vulnerability is called directory traversal.
Most of the characters in the Unicode standard have variable-length encodings. This requires consideration of every character string and makes string validation more complex. If not all character encodings are taken into account in the software implementation, there is a possibility of errors. This problem can be eliminated by using a single encoding for each character. The best alternative any software can use is to check that the string is canonical. Strings that are not canonical can be rejected.
A canonical XML document is an XML document in XML canonical form. It is defined by canonical XML specification. Canonicalization in XML eliminates white space within tags, sorts namespace references, and eliminates redundant and specific character encodings. It also removes XML and DOCTYPE declarations and converts relative URLs to absolute URLs.