Skip to content

KV table_id namespace isolation#49

Open
heifner wants to merge 13 commits intofeature/min-wasmfrom
feature/kv-table-id
Open

KV table_id namespace isolation#49
heifner wants to merge 13 commits intofeature/min-wasmfrom
feature/kv-table-id

Conversation

@heifner
Copy link
Copy Markdown
Contributor

@heifner heifner commented Apr 9, 2026

Summary

Replace per-table key_format (uint8, 0/1) with table_id (uint16, DJB2 hash of table name % 65536). Each table and secondary index gets a unique table_id computed at compile time, passed as the first parameter to all KV intrinsics. This eliminates table-name bytes from keys, saving 8 bytes per row.

Adds kv::scoped_table — same API as kv::table but with mandatory scope, producing byte-identical primary keys to kv_multi_index. Enables drop-in migration from kv_multi_index with 18-38% WASM size reduction.

Companion PR: Wire-Network/wire-sysio#290 (chain-side table_id support + unified get_table_rows API)

kv::scoped_table (new)

  • table_impl<..., bool Scoped> internal class with if constexpr — zero overhead for unscoped, no branch for scoped
  • kv::scoped_table inherits table_impl<..., true>, requires (code, scope) constructor
  • Primary keys: [scope:8B BE][K encoded] — byte-identical to kv_multi_index
  • Secondary keys: [scope:8B BE][sec_value] with check_scope() boundary enforcement
  • Secondary lower_bound/upper_bound check scope after positioning
  • scope_iterator for enumerating distinct scopes — scope_lower_bound(code, scope) / scope_end()
  • Unscoped pri_key in secondary index (saves 8B/row vs storing scoped key)
  • cbegin()/cend() on secondary_index_view

kv::table improvements

  • upsert(payer, key, default_value, updater_lambda) — optimal insert-or-modify in 2 intrinsic calls (1 kv_get + 1 kv_set)
  • Custom error message on emplace(), erase(), modify() — e.g. emplace(payer, key, val, "token already exists")
  • encode_be64()/decode_be64() free functions in kv_utils.hpp (eliminates 12 duplicated BE loops)
  • idx_check_scope() shared helper in secondary_index_view
  • same_payer moved to kv_utils.hpp (accessible from all table types, not just kv_multi_index)

ABI generation

  • Recognize scoped_table in abigen, prepend "scope"/"name" to key_names/key_types
  • Emit secondary_indexes for multi_index/kv_multi_index (positional table_ids via compute_mi_sec_table_id_from_raw). Previously only kv::table/kv::scoped_table had this — now legacy multi_index tables can be queried by named index via get_table_rows.
  • Refactored extract_secondary_index into a shared helper used by both kv::table (hashed) and multi_index (positional) extraction paths
  • Fix checksum256fixed_bytes<32> translation bug: when a template arg is retrieved via ClassTemplateSpecializationDecl::getTemplateArgs()[i].getAsType(), Clang returns the canonical type (typedef sugar stripped). For typedef'd specializations like checksum256, the TemplateSpecializationType wrapper is gone, leaving a bare RecordType. The existing is_template_specialization only handled the sugared form, so secondary indexes with checksum256 keys emitted "fixed_bytes<32>" instead of "checksum256". New is_class_template_specialization_decl helper + translate_type branch walks the canonical decl directly and runs through the alias table — but only if the table actually has a mapping, so types like std::string (whose typedef-preserved name is already aliased) aren't intercepted.
  • Toolchain golden tests for both scoped_table and multi_index ABI output, plus extended kv_table_sec_indexes with a checksum256 secondary index to cover the canonical-RecordType path on the kv::table side too.

Library changes (from previous commits)

  • kv_utils.hppbe_key_stream, ser_buf, key_buf, bounds checks, encode_be64/decode_be64, same_payer
  • kv_it_handle.hpp — RAII iterator handles
  • kv_constants.hppcompute_table_id, compute_sec_table_id, compute_mi_sec_table_id
  • hash_id.hpp_i literal for long table names
  • kv_global.hpp — Fix remove() semantics
  • kv_multi_index.hpp — Scope in secondary keys, includes kv_utils.hpp for same_payer
  • kv_table.hpptable_impl refactor, scope plumbing, all key encode/decode methods

Cleanup

  • Remove dead files in tools/include/sysio/: codegen.hpp, gen.hpp, ppcallbacks.hpp, error_emitter.hpp. These were left over from the LLVM 13 / cdt-llvm era — companion tools/include/sysio/abigen.hpp was deleted in 6bb292c but these were missed. They reference custom AST methods like decl->isSysioIgnore() that no longer exist on vanilla LLVM 18 and cannot compile against the current toolchain. Nothing in the active build includes them.

Documentation

  • kv-scoped-table.md — Full reference: constructor, key layout, scope iteration, migration guide from multi_index, API diff (payer-first, modify forms)
  • kv-table.md — upsert lambda pattern with intrinsic-call comparison table
  • kv-storage-guide.md — Updated decision matrix with scoped_table column
  • kv-multi-index.md — Points to scoped_table as recommended replacement

…hardening

Replace per-table key_format (uint8, 0/1) with table_id (uint16, DJB2
hash of table name % 65536). Each table and secondary index gets a
unique table_id computed at compile time, passed as the first parameter
to all KV intrinsics. This eliminates table-name bytes from keys,
saving 8 bytes per row.

Library changes:
- Extract be_key_stream, ser_buf, key_buf into kv_utils.hpp shared by
  kv_table, kv_multi_index, and kv_raw_table (deleted as separate file)
- Add bounds checks to all be_key_stream fixed-size write methods
- Add kv_it_handle.hpp RAII wrapper for iterator/index handles
- Add hash_id.hpp with _i literal for table names > 13 chars
- Add compute_table_id, compute_sec_table_id, compute_mi_sec_table_id
  to kv_constants.hpp (DJB2 hash matching chain-side implementation)
- Make pack_size and SYSLIB_SERIALIZE constexpr for is_fixed_serializable_v
- kv::global::remove() uses current_receiver() for exists check
- kv_multi_index::available_primary_key() converted to RAII it_handle
- Fix hardcoded 256 → kv_value_stack_size in kv_table.hpp

ABI generation:
- Emit table_id for each table in ABI
- Emit secondary_indexes array with per-index name, key_type, table_id
- Emit key_names/key_types metadata for kv::table and kv::global
- Support hash_id table names in abigen
- Compile-time static_assert for table_id collisions (kv::table)

Documentation and examples:
- Add kv_global_example, kv_table_example, hash_id_example
- Rewrite kv-storage-guide.md with table_id, key encoding, KV_KEY_BUF_CAP
- Update all KV docs for new API surface
- Remove kv-indexed-table.md and kv-raw-table.md (consolidated)

Tests:
- Split test_multi_index into two contracts (net limit)
- Add cross-scope secondary index isolation tests
- Add kv_global POD/string, kv_singleton, kv_raw_table boundary tests
- Add hash_id integration and abigen tests
- Add table_id collision abigen-fail test
heifner added 3 commits April 9, 2026 11:55
…lti_index

Introduces kv::scoped_table<TableName, K, V, Indices...> as a thin wrapper
over table_impl<..., Scoped=true>. Primary keys are [scope:8B BE][K encoded],
matching multi_index's layout for drop-in migration. All scope branches use
if constexpr — zero overhead for unscoped kv::table.

Key changes:
- kv_table.hpp: Rename table → table_impl<..., bool Scoped>, add scope
  members, create_primary_it() with prefix, scoped secondary iteration
  with check_scope(), unscoped pri_key in secondary index (saves 8B/row)
- kv_scoped_table.hpp: New file — scoped_table subclass + scope_iterator
  for enumerating distinct scopes via scope_lower_bound()/scope_end()
- abigen.hpp: Recognize scoped_table, prepend "scope"/"name" to key_names
- 22 integration test cases covering scope isolation, secondary index
  scoping, bounds, reverse iteration, kvcompat, scope iteration, etc.
- Toolchain ABI test verifying key_names includes "scope"
- Docs: kv-scoped-table.md, updated storage guide and multi_index doc
Optimal insert-or-modify in 2 intrinsic calls (1 kv_get + 1 kv_set).
If key doesn't exist, inserts default_value. If key exists, reads old
value, applies updater lambda, writes back. Handles secondary index
updates correctly.

This is the recommended pattern for common operations like adding token
balances where you create a zero-balance row on first deposit and
increment on subsequent deposits.
- Fix secondary_index_view::lower_bound/upper_bound to check_scope
  after positioning — prevents cross-scope data leakage when no
  matching key exists in the current scope
- Extract encode_be64/decode_be64 free functions to kv_utils.hpp
  (eliminates 9 duplicated BE encoding loops)
- Extract idx_check_scope static helper in secondary_index_view
  (shared by const_iterator and key_iterator)
- Move same_payer to kv_utils.hpp (available to all table types)
- Remove redundant operator!= (C++20 synthesizes from ==)
- Add 7 negative tests: duplicate emplace, custom error message,
  erase/modify/get/require_find missing key, deref end iterator
- Add cross-scope secondary find isolation test (secfindiso)
- Add secondary lower_bound scope boundary test (seclbbug)
- Rename crossread → explcode (test only verifies explicit code param)
heifner added 2 commits April 10, 2026 07:33
Previously only kv::table/kv::scoped_table had secondary_indexes
in the generated ABI. multi_index and kv_multi_index template args
weren't extracted, so the unified get_table_rows API couldn't
resolve named secondary index queries on these table types.

Adds positional table_id computation (compute_mi_sec_table_id_from_raw)
matching the chain-side formula. Extracts indexed_by<Name, Extractor>
template args from positions 2..N.

Refactors the existing kv::table secondary index extraction into a
shared extract_secondary_index member helper to eliminate the ~25 lines
of duplicated logic between the two paths.

New abigen-pass test verifies multi_index with 3 secondary indices
(uint64, uint64, checksum256) emits correct ABI with positional
table_ids.
Fix bug where multi_index secondary indexes with checksum256 (typedef
for fixed_bytes<32>) emitted "fixed_bytes<32>" instead of "checksum256"
in the generated ABI.

Root cause: when a template arg is retrieved via
ClassTemplateSpecializationDecl::getTemplateArgs()[i].getAsType(),
Clang returns the canonical type. For typedef'd template specializations
like checksum256, the TemplateSpecializationType sugar is stripped,
leaving a bare RecordType pointing at a ClassTemplateSpecializationDecl.
generation_utils::is_template_specialization only handled
TemplateSpecializationType (and ElaboratedType wrapping it), so the
unnamed-template fallback in translate_type never fired and the type
fell through to _translate_type which returned "fixed_bytes<32>"
unchanged.

Fix: add is_class_template_specialization_decl helper that handles the
canonical RecordType form, and a new branch in translate_type that
walks the decl directly to build template_name_arg0_arg1... and run it
through the alias table. The branch only returns its result when the
alias table actually translates the key, so types like std::string
(whose typedef-preserved name basic_string<char> is already aliased)
are not intercepted.

Test coverage:
- multi_index_sec_indexes: checksum256 as struct field + multi_index
  secondary index extractor return type (expected ABI updated from the
  buggy fixed_bytes<32> to the correct checksum256)
- kv_table_sec_indexes: extended with checksum256 hash field and byhash
  index, exercising the same path through kv::table

Cleanup: remove dead files in tools/include/sysio/ left over from the
LLVM 13 / cdt-llvm era. The companion abigen.hpp was deleted in
6bb292c but codegen.hpp, gen.hpp, ppcallbacks.hpp, and
error_emitter.hpp were missed. They reference custom AST methods like
decl->isSysioIgnore() that no longer exist on vanilla LLVM 18 and
cannot compile; nothing in the active toolchain includes them.
heifner added a commit to Wire-Network/wire-sysio that referenced this pull request Apr 10, 2026
Replaces multi_index/singleton with kv::table / kv::scoped_table /
kv::global across all six production contracts (sysio.token,
sysio.bios, sysio.msig, sysio.system, sysio.roa, sysio.authex).

Storage choices:
 - kv::scoped_table  — when keys naturally have a scope (sysio.token
                       accounts/stat, sysio.msig proposal/approvals,
                       sysio.roa nodeowners/policies/sponsors)
 - kv::table         — flat tables with no scope concept
                       (sysio.system finkeys/producers/blockinfo,
                       sysio.bios abihash, sysio.authex links)
 - kv::global        — single-row config / counter state
                       (sysio.system trxpglobal, sysio.roa fin_key_id,
                       producer payment globals)

Key wins:
 - 18-38% WASM size reduction across the board (sysio.roa: -33%,
   sysio.system: -18%, sysio.authex: -10%) from RAII handles, stack
   buffers, and the if-constexpr scope path in kv::table.
 - Drop-in replacement for kv_multi_index: kv::scoped_table produces
   byte-identical primary keys, so no on-chain data migration needed.
 - Per-table table_id namespace isolation (uint16 DJB2 hash) eliminates
   the table-name byte cost in keys.

Supporting changes:
 - libraries/testing/tester.cpp: get_row_by_id() now tries the scoped
   key layout first ([scope:8B][pk:8B] for kv_multi_index/scoped_table),
   then falls back to kv::global's bare [pk:8B] layout, so existing
   helper APIs keep working across both storage shapes.
 - plugins/producer_plugin/src/trx_priority_db.cpp: rewritten to read
   trxpriority/trxpglobal under their new key formats (kv::global for
   the singleton, kv::table for the priority list).
 - programs/clio/main.cpp: msig review subcommand uses the correct
   ABI field names in `find` payloads — `proposal_name`/`account`
   instead of the legacy `primary_key`. Caught in code review of #290.
 - contracts/sysio.system/include/sysio.system/sysio.system.hpp: imports
   `sysio::kv::same_payer` instead of the now-removed `sysio::same_payer`.
 - contracts/tests: finalizer_key, roa, and system_blockinfo tests
   updated for the new storage layouts and bumped RAM where needed.
 - test_contracts/blockinfo_tester recompiled against the new sysio.system
   action set.

Reference data regenerated for the new on-chain action merkle roots:
 - unittests/deep-mind/deep-mind.log
 - unittests/snapshots/blocks.{index,log}, snap_v1.{bin.gz,bin.json.gz,json.gz}
 - unittests/test-data/consensus_blockchain/{blocks.index,blocks.log,id,snapshot}
 - tests/sysio_util_snapshot_info_test.py expected head_block_id

Depends on:
 - #290 (unified get_table_rows API + the
   secondary-index scope-prefix fix in fetch_primary)
 - Wire-Network/wire-cdt#49 (kv::scoped_table, abigen secondary_indexes
   for multi_index, canonical-RecordType checksum256 translation)
heifner added a commit to Wire-Network/wire-libraries-ts that referenced this pull request Apr 10, 2026
Wire-sysio PR Wire-Network/wire-sysio#288 widened table_def.name from
sysio::name (uint64) to a free-form std::string and appended two new
fields — table_id (uint16, DJB2 hash of the table name % 65536) and
secondary_indexes (vector<index_def>) — for KV-table per-table
namespace isolation. The wire format break means any sdk-core caller
that loads a contract ABI from on-chain via get_raw_abi (or any other
binary-encoded source) parses misaligned starting at the first table
field — name reads the wrong bytes, every subsequent field is shifted,
and ricardian_clauses ends up reading garbage.

PR Wire-Network/wire-sysio#290 separately changed the get_table_rows
HTTP response shape for KV-backed tables. Each row used to be the
decoded struct directly (or {data, payer} when show_payer was set);
the unified endpoint now returns {key, value, payer?} per row. The
ChainAPI wrapper used to destructure {data, payer} on show_payer and
then map rows through Serializer.decode for typed callers — both paths
break against the new shape.

This commit takes a hard break on the binary format (no fallback to
the legacy EOSIO 8-byte name encoding) and a backward-compatible
unwrap on the HTTP response shape.

packages/sdk-core/src/chain/Abi.ts:
 - new ABI.Index interface (name, key_type, table_id) mirroring
   sysio::chain::index_def in wire-sysio's
   libraries/chain/include/sysio/chain/abi_def.hpp.
 - ABI.Table gains optional table_id (uint16) and secondary_indexes
   (Index[]) fields. Optional so hand-built test fixtures and ABIs
   constructed programmatically don't need to specify them; the
   encoder defaults to 0 / [] when omitted.
 - fromABI() reads name as a length-prefixed string instead of an
   8-byte sysio::name uint64, then consumes table_id (uint16 LE) and
   secondary_indexes (vector<index_def>) after the existing type
   field. Also reads a trailing protobuf_types extension after enums
   so that any subsequent extension appended to abi_def parses
   cleanly to EOF.
 - toABI() mirrors the parser: writes table_def.name via
   encoder.writeString(String(table.name)) (the String() coercion
   keeps NameType-typed callers compiling), then table_id and
   secondary_indexes, and finally the empty protobuf_types string.

packages/sdk-core/src/api/v1/Chain.ts:
 - get_table_rows() now detects the unified KV row shape — each row
   has both `key` and `value` keys — and unwraps to the inner
   value before downstream processing. The old EOSIO shapes
   (decoded struct directly, or {data, payer} on show_payer) are
   left untouched, so sdk-core remains usable against EOSIO chains.
 - When show_payer is set on a KV row, payer is captured into
   ram_payers in the same way as the legacy {data, payer} path.
   Missing payer fields coerce to an empty Name rather than
   throwing.

Tests (15 new cases, 304/304 sdk-core suite green):

 packages/sdk-core/tests/chain/Abi.test.ts (8 cases)
  - Round-trips a table with table_id + empty secondary_indexes
  - Round-trips a long table name (>12 chars) — the whole reason
    name was widened
  - Round-trips secondary_indexes with checksum256 key_type
    (the case the wire-cdt #49 abigen fix unblocked end-to-end)
  - table_id 0 and 65535 boundary values
  - Missing table_id defaults to 0 on encode + decode
  - protobuf_types extension is consumed without affecting enums
  - Encoder always emits the empty protobuf_types trailer
  - Multi-table ABI end-to-end round-trip (structs + actions +
    secondary_indexes)

 packages/sdk-core/tests/api/v1/Chain.test.ts (7 cases)
  - Unwraps the new {key, value} shape into plain rows
  - Unwraps the new shape with show_payer + captures ram_payers
  - Preserves legacy plain-row shape from EOSIO chains
  - Preserves legacy {data, payer} show_payer shape from EOSIO
  - Empty rows array works for both shapes
  - Missing payer in new shape coerces to empty name (no throw)
  - Uses an in-memory MockProvider so no network access required

Out of scope:
 - packages/sdk-core/src/resources/{Ram,Rex,Powerup}.ts call
   get_table_rows for tables (rammarket, rex_pool, powup_state) that
   do not exist in wire-sysio and have not for some time. These
   resources were already dead on Wire chains regardless of any of
   the in-flight wire-sysio PRs and remain dead after this commit.
 - packages/sdk-core/src/types/SystemContractTypes.ts is auto-
   generated and would benefit from a regen pass after sysio.token's
   migration to kv::scoped_table merges (Wire-Network/wire-sysio#291),
   but the field schemas of the structs it generates do not depend
   on the binary table_def layout.

Companion PRs:
 - Wire-Network/wire-sysio#288 — chain-side table_id namespace isolation
 - Wire-Network/wire-sysio#290 — unified get_table_rows API
 - Wire-Network/wire-sysio#291 — sysio.token migrated to kv::scoped_table
 - Wire-Network/wire-cdt#49 — kv::scoped_table + abigen secondary_indexes
 - Wire-Network/wirejs-native#4 — TS schema mirror for abi_def
 - Wire-Network/abieos#12 — abieos C API + binary format update
 - Wire-Network/node-abieos#8 — bumps abieos + drops the lossy uint64 round-trip
 - Wire-Network/hyperion-history-api#9 — KV delta handler + AbiDefinitions
… body

`process_function` inserts every walked function into `func_calls` with an
initially-empty `vector<CallExpr*>` *before* traversing its body, so a
non-end iterator from `func_calls.find(ra)` does not by itself indicate the
read-only action transitively calls a write host function — only a non-empty
vector does. Without this guard `process_read_only_actions` fires on every
read-only action whose body has been processed, regardless of its content,
making `[[sysio::action, sysio::read_only]]` unusable for any non-trivial
action.

The existing `tests/toolchain/compile-pass/warn_action_read_only.cpp` only
ran in `--warn-action-read-only` mode (warning, not error) and intentionally
called `set_resource_limit`, so it tested the violation path; there was no
test exercising a clean read-only action with a non-trivial body, which is
why this regression sat undetected.

Surfaced while adding a `getproposal` read-only action to sysio.msig that
just calls `kv_get` / `kv_contains` (both read-only intrinsics) — every
attempt to compile the contract failed with "read-only action cannot call
write host function" until this fix.
heifner added 6 commits April 10, 2026 21:17
Two issues combined to break compile-fail/host_functions_tests after
commit 95b19d9 fixed the read-only validator false positive:

1. Test had stale kv_idx_{store,remove,update} forward declarations
   using the pre-refactor (payer, uint64_t table, uint32_t index_id)
   signature instead of the current (payer, uint32_t table_id) form,
   causing conflicting-type errors before codegen ran.

2. write_host_funcs in gen.hpp was never updated when the KV API
   replaced the legacy db_*_i64 intrinsics (5159683). The old
   validator over-reported every read-only action body regardless
   of content, so the test's 15 actions were flagged by coincidence;
   once that bug was fixed only 10 legitimate violations fired.

Add kv_set, kv_erase, kv_idx_store, kv_idx_remove, kv_idx_update to
write_host_funcs and drop the 22 dead db_* entries that no longer
exist as intrinsics.
defined_in_contract() is a fallback check that returns false when
contract_class is null. The warning it emitted in that branch pointed
at the template instantiation location (kv_multi_index.hpp:202 — the
class template definition inside the header), which is useless as a
source location for the user.

The "currently this is unreachable as we do not traverse non-main
file translation units" comment was wrong: multi_index is now a type
alias for kv_multi_index, so any TU including sysio.hpp that
instantiates multi_index<...> walks this path, and the warning fires
whenever --contract name doesn't match the class in the file (e.g.
the toolchain-tester deriving --contract from filename).
The previous (codegen error.*){15}/{20} count regexes in
host_functions_tests.json and hf_indirect_call_tests.json only
verified a raw count, so silently passed whenever the flagged-action
set shifted — exactly how the write_host_funcs/kv_idx_* drift sat
undetected until commit 95b19d9 changed the count.

Teach handle_expecteds to accept either a string or a list of
patterns and apply each independently, then pin both tests to
per-action name markers (codegen error.*bool <name>\b). Adding,
removing, or renaming a flagged action now fails its specific
pattern with a clear hint.

Also fix a drive-by -Wreturn-type warning in
hf_indirect_call_tests.cpp taliasma() which was missing the trailing
return true; the test is expected to fail codegen, not ordinary
compilation, so the warning was noise.
When a contract is compiled natively (add_native_contract), CDT defines
uint128_t and int128_t as preprocessor macros expanding to the two-token
type names `unsigned __int128` and `__int128`. Functional-cast syntax
`uint128_t(x)` then becomes `unsigned __int128(x)`, which the compiler
rejects with "type-id cannot have a name" because functional casts
require a single type-id token.

Switch the two offending casts in be_key_reader to static_cast form,
which accepts multi-token type names. Behaviour is unchanged in the
normal WASM toolchain compile where uint128_t is a typedef.
Covers uint128_t and int128_t round-trip through be_key_reader:
uint128_t decode, int128_t zero, int128_t -1, int128_t INT128_MAX.
Runs as a native unit test so any future regression in the
uint128_t/int128_t handling in kv_table.hpp fails at test time
rather than at contract-build time.
cdt-codegen merges every .desc file it finds in the output directory
into the contract ABI, assuming each one comes from a current source
.cpp in the same contract. When a .cpp is removed (or renamed) but its
.desc is left behind in the build dir, the stale file gets merged too
and abimerge fails with:

    Error, ABI structs malformed : <name> already defined

The user must manually clean the build dir to recover.

Stamp the originating source path into each .desc as `____source_file`
when it is generated. On the read side, skip (and unlink) any .desc
whose recorded source no longer exists on disk, so an incremental build
recovers automatically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant