Skip to content

UTF-8 does not reset state when returning error #359

@ChALkeR

Description

@ChALkeR

What is the issue with the Encoding Standard?

Same as #358 but for Unicode BOM
If the proposal of #358 is to reset state for errors, then what should happen to BOM seen?

I don't argue that it should be reset, but there is definitely some sort of issue and inconsistency there


Platform status is highly inconsistent:

const r = (d, ...a) => {
  try {
    return d.decode(...a).length
  } catch {}
  return 'e'
}

const a = new TextDecoder('utf8', { fatal: true })
console.log('A',
  r(a, Uint8Array.of(0xef, 0xbb, 0xbf, 0xff), { stream: true }),
  r(a, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error does not stick in Chrome/Safari
)

const b = new TextDecoder('utf8', { fatal: true })
console.log('B',
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf, 0xef), { stream: true }),
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error sticks in Chrome / Safari
  r(b, Uint8Array.of(0xbb, 0xbf), { stream: true }),
  r(b, Uint8Array.of(), { stream: true }),
)

const c = new TextDecoder('utf8', { fatal: true })
console.log('C',
  r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(c, Uint8Array.of(0xff), { stream: true }),
  r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)

// Bonus: if BOM is not reset, is it processed on errors?
const d = new TextDecoder('utf8', { fatal: true })
const e = new TextDecoder('utf8', { fatal: true })
console.log('D',
  r(d, Uint8Array.of(0x20, 0xff), { stream: true }),
  r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(d, Uint8Array.of(0xff), { stream: true }),
  r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
  r(e, Uint8Array.of(0xff), { stream: true }),
  r(e, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)

Chrome: first error did not get stuck, second error got stuck

A e 0
B 0 e e e e
C 0 e 1
D e 0 e 1 e 0

WebKit: first error did not get stuck, second error got stuck

A e 1
B 0 e e e e
C 0 e 1
D e 1 e 1 e 0

Firefox, Servo, Deno, Static Hermes: errors do not stick, bom does not get reset, bom seen is set on errors

A e 1
B 0 e 1 e 0
C 0 e 1
D e 1 e 1 e 1

Node.js: errors do not stick

A e 0
B 0 e 1 e 0
C 0 e 1
D e 0 e 1 e 0

Bun: just broken

A e 0
B 0 0 0 1 0
C 0 e 0
D e 0 e 0 e 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions