Title:Kill the 'identity' Encoding
Version:$Revision: 1.2 $
Author:Alexandre Ferrieux <alexandre dot ferrieux at gmail dot com>
Created:Thursday, 05 February 2009
Keywords:Tcl, encoding, invalid UTF-8


This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations.


The contract of string representations in Tcl states that the bytes field (the strep) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [encoding system]=="identity", or through [encoding convertfrom identity]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc.

Proposed Change

This TIP proposes to simply close that single window to the dark side.


The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest.

Reference Example

See Bug 2564363 [1]


This document has been placed in the public domain.

