TIP #345: Kill the 'identity' Encoding


TIP:345
Title:Kill the 'identity' Encoding
Version:$Revision: 1.2 $
Author:Alexandre Ferrieux <alexandre dot ferrieux at gmail dot com>
State:Draft
Type:Project
Tcl-Version:8.7
Vote:Pending
Created:Thursday, 05 February 2009
Keywords:Tcl, encoding, invalid UTF-8

Abstract

This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations.

Background

The contract of string representations in Tcl states that the bytes field (the strep) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [encoding system]=="identity", or through [encoding convertfrom identity]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc.

Proposed Change

This TIP proposes to simply close that single window to the dark side.

Rationale

The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest.

Reference Example

See Bug 2564363 [1]

Copyright

This document has been placed in the public domain.


Powered by Tcl[Index] [History] [HTML Format] [Source Format] [LaTeX Format] [Text Format] [XML Format] [*roff Format (experimental)] [RTF Format (experimental)]

TIP AutoGenerator - written by Donal K. Fellows