-
Notifications
You must be signed in to change notification settings - Fork 85
Open
Description
What is the issue with the Encoding Standard?
Same as #358 but for Unicode BOM
If the proposal of #358 is to reset state for errors, then what should happen to BOM seen?
I don't argue that it should be reset, but there is definitely some sort of issue and inconsistency there
Platform status is highly inconsistent:
const r = (d, ...a) => {
try {
return d.decode(...a).length
} catch {}
return 'e'
}
const a = new TextDecoder('utf8', { fatal: true })
console.log('A',
r(a, Uint8Array.of(0xef, 0xbb, 0xbf, 0xff), { stream: true }),
r(a, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error does not stick in Chrome/Safari
)
const b = new TextDecoder('utf8', { fatal: true })
console.log('B',
r(b, Uint8Array.of(0xef, 0xbb, 0xbf, 0xef), { stream: true }),
r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(b, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }), // error sticks in Chrome / Safari
r(b, Uint8Array.of(0xbb, 0xbf), { stream: true }),
r(b, Uint8Array.of(), { stream: true }),
)
const c = new TextDecoder('utf8', { fatal: true })
console.log('C',
r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(c, Uint8Array.of(0xff), { stream: true }),
r(c, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)
// Bonus: if BOM is not reset, is it processed on errors?
const d = new TextDecoder('utf8', { fatal: true })
const e = new TextDecoder('utf8', { fatal: true })
console.log('D',
r(d, Uint8Array.of(0x20, 0xff), { stream: true }),
r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(d, Uint8Array.of(0xff), { stream: true }),
r(d, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
r(e, Uint8Array.of(0xff), { stream: true }),
r(e, Uint8Array.of(0xef, 0xbb, 0xbf), { stream: true }),
)Chrome: first error did not get stuck, second error got stuck
A e 0
B 0 e e e e
C 0 e 1
D e 0 e 1 e 0
WebKit: first error did not get stuck, second error got stuck
A e 1
B 0 e e e e
C 0 e 1
D e 1 e 1 e 0
Firefox, Servo, Deno, Static Hermes: errors do not stick, bom does not get reset, bom seen is set on errors
A e 1
B 0 e 1 e 0
C 0 e 1
D e 1 e 1 e 1
Node.js: errors do not stick
A e 0
B 0 e 1 e 0
C 0 e 1
D e 0 e 1 e 0
Bun: just broken
A e 0
B 0 0 0 1 0
C 0 e 0
D e 0 e 0 e 0
domenic
Metadata
Metadata
Assignees
Labels
No labels