Copilot improvements by graydon · Pull Request #5111 · stellar/stellar-core

graydon · 2026-01-28T04:07:47Z

This PR does 3 main things:

Fixes the devcontainer to work better with noble and upgrades its contents a bit.
Rewrites the copilot-instructions.md to mostly just tell the agent to use other tools & skills.
Adds a bunch of basic skills for working with core (including a skill to help write more skills).

I don't think there's much to object to in the skills and I am also not all that certain that copilot will use them as actively as we'd like; on the other hand it roughly seems to work with my local testing and enables some behaviors that I was otherwise telling it over and over / watching it do wrong. so I think it's probably worth starting off with something like this. if we find them making things worse they can always be edited back down / removed.

Copilot

Pull request overview

This pull request makes three key improvements to the stellar-core development environment and tooling: updates the devcontainer configuration for Ubuntu 24.04 compatibility, restructures the Copilot instructions to emphasize using tools and skills, and introduces eight comprehensive skills for code development, review, testing, and validation workflows.

Changes:

Updated devcontainer configuration from Ubuntu 20.04 to 24.04 (noble) with modern VSCode customizations and development tool extensions
Rewrote Copilot instructions to guide agents toward using LSP tools, subagents, and skill-based workflows rather than directly providing coding guidelines
Added eight new skill documents covering validation, testing, building, code review (high and low level), test addition, build configuration, and skill creation

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
.github/copilot-instructions.md	Rewrites instructions to emphasize agent-based workflows and tool usage instead of direct coding guidelines
.devcontainer/devcontainer.json	Updates container name to Ubuntu 24.04, modernizes customizations structure, and adds development extensions
.devcontainer/Dockerfile	Updates for Ubuntu 24.04 sources, adds user/group handling logic, and installs additional development tools
.claude/skills/validating-a-change/SKILL.md	Defines comprehensive validation workflow orchestrating multiple review and testing steps
.claude/skills/running-tests/SKILL.md	Provides detailed test execution guidance from smoke tests through sanitizer builds
.claude/skills/running-make-to-build/SKILL.md	Documents build system structure and correct usage patterns
.claude/skills/low-level-code-review/SKILL.md	Establishes mechanical code review process for localized issues
.claude/skills/high-level-code-review/SKILL.md	Defines semantic code review criteria for correctness and design consistency
.claude/skills/configuring-the-build/SKILL.md	Documents build configuration options and autotools workflow
.claude/skills/adding-tests/SKILL.md	Guides test analysis and creation for various change types
.claude/skills/adding-a-new-skill/SKILL.md	Provides meta-framework for creating additional skills

.github/copilot-instructions.md

.claude/skills/adding-tests/SKILL.md

.devcontainer/Dockerfile

.devcontainer/devcontainer.json

.github/copilot-instructions.md

dmkozh · 2026-01-28T18:30:05Z

.claude/skills/running-tests/SKILL.md

+
+Example:
+```bash
+./stellar-core test --ll fatal -r simple --disable-dots --abort "test name"


Did you have success with agents running tests consistently using this skill? I wonder if a short version of this should belong to instructions for better discoverability.

I would not say "consistently", no. I'll put it in instructions.

dmkozh · 2026-01-28T18:34:13Z

.claude/skills/configuring-the-build/SKILL.md

+
+# Overview
+
+The build works like this:


I wonder if we need to be so verbose here. Wouldn't just specifying a common command line work? I also wonder if that command line could just go to instructions. I'm fairly skeptical of a need to do something complex with the build, using a single fixed command line would probably be sufficient for local development, and re-configuring builds is generally rare.

I agree makes sense to hoist the "most likely" configure / make / run-one-test / make check in parallel commands up to the instructions file. I figured I'd tell them some details on other parts of the build system since they're not uncommon in my experience -- I switch compilers or configure flags fairly often -- but there's no need for them to dig into such a file the majority of the time. I'll make this adjustment.

(Also it seems like I was a bit too optimistic about the ability of subagents to execute complex steps on their own, they have a tendency to just wander off; I will alter the language around them)

marta-lokhova

Couple of suggestions -- feel free to take it or leave it. These are some things I found to work ok locally. Some of these are more explicit wordings of what you already have. That being said, no rigorous evaluation has been done on any of these instructions, all evidence is purely anecdotal.

marta-lokhova · 2026-01-28T20:25:17Z

.claude/skills/adding-tests/SKILL.md

+
+- Ensure functional tests still pass
+- Consider adding benchmark tests if appropriate
+- Consider adding metrics or tracy ZoneScoped annotations to help


I'd also specifically call out LogSlowExecution helper to flag perf degradations.

marta-lokhova · 2026-01-28T20:27:45Z

.claude/skills/adding-tests/SKILL.md

+
+## Step 4: Check for Randomized Test Needs
+
+For changes affecting:


Just curious, have you found the adding-tests skill to work well with more complex changes that involve flows between multiple subsystems or complex places like SCP or Ledger state management? (I've seen very inconsistent results).

marta-lokhova · 2026-01-28T20:36:41Z

.claude/skills/adding-tests/SKILL.md

+in this codebase:
+
+1. **Test fixture setup**: Look for how test fixtures are created
+2. **Assertion style**: Match the assertion macros used elsewhere


May be worth being a bit more explicit here, since I've seen Claude consistently silently fail tests with warnings and early returns (instead of crashing). For agentic stuff, I also found it helpful to always use short timeouts because sometimes half-broken code loops until cli times out (this wastes a couple of minutes on each iteration). Example checks:

### 8. Test Strictness - [ ] Do my tests have timeouts? (no infinite hangs) - [ ] Do my tests assert on failure? (no silent failures or early returns) - [ ] Am I using logging where I should use assert/panic?

re timeouts: can also add an explicit "wrap each stellar-core test with timeout"

marta-lokhova · 2026-01-28T20:41:17Z

.claude/skills/high-level-code-review/SKILL.md

+3. **Build an understanding of context**. Use LSP tools to get lists of symbols
+   and call graphs and type hierarchies as necessary. For each modified file,
+   read enough of the surrounding code to understand the context (at minimum,
+   the entire function or class being modified).   


Sort of related, but I've found it useful to force agents to never make assumptions (e.g. it will think it understands the problem and context, but with a lot of assumptions, which most of the time are incorrect)

Am I assuming requirements, constraints, or behavior? → ASK FIRST Am I guessing how something works instead of reading code/docs? → READ FIRST

marta-lokhova · 2026-01-28T20:46:47Z

.claude/skills/high-level-code-review/SKILL.md

+- Are errors detected and reported appropriately?
+- Are error messages clear and actionable?
+- Is error recovery handled correctly?
+- Are exceptions used appropriately (if at all)?


This might be saying similar things differently, but I found it useful to specify explicitly that we expect the program to correctly crash vs throw an exception vs catch the exception depending on the subsystem we're dealing with. Another few bullets that might be helpful:

1. **Crash Recovery & Durability** - Can data be corrupted or lost? 2. **Atomicity & Consistency** - What state is left after partial failure? 3. **Correct Usage of APIs** - Are new/modified interfaces used correctly everywhere?

marta-lokhova · 2026-01-28T20:48:55Z

.claude/skills/high-level-code-review/SKILL.md

+- Are error messages clear and actionable?
+- Is error recovery handled correctly?
+- Are exceptions used appropriately (if at all)?
+


Also, I'm tempted to add a section on fsync and file system stuff specifically, just because it's been such a pain, but I'm not sure if it's too specific...

**Atomic file patterns:** - Look for write-to-temp + rename patterns - Verify temp file is fsynced before rename - Verify directory is fsynced after rename - Check: what happens if crash occurs between steps? **Questions to ask:** - If power is lost mid-operation, what state is on disk? - Can the operation be resumed/retried after crash? - Are there ordering dependencies between file writes?

marta-lokhova · 2026-01-28T20:51:31Z

.claude/skills/low-level-code-review/SKILL.md

+- **Double-free**: Freeing or deleting a resource twice.
+- **Boolean logic errors**: Wrong operator precedence, De Morgan's law mistakes,
+  inverted conditions.
+- **String formatting mismatches**: Format specifier doesn't match argument type


Should we add misuse of C++ containers and standard library in general? (like wrong params, UB, etc)

marta-lokhova · 2026-01-28T20:54:17Z

.claude/skills/running-make-to-build/SKILL.md

+- NEVER run from a subdirectory
+- NEVER run with `make -C <somedir>` for any other directory
+- NEVER run `cargo` manually, let `make` run it
+- NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am`


It also helps to run make command with | tail -N to avoid dumping a gigantic make output into context.

marta-lokhova · 2026-01-28T20:57:54Z

.claude/skills/validating-a-change/SKILL.md

+2. Adding any missing tests
+3. Running tests 
+4. Multiple configurations (catches config-specific issues)
+5. Formatting last (mechanical cleanup)


@MonsieurNicolas recently pointed out a cool trick to produce clean commit history without changing the code, https://x.com/steveruizok/status/2014095245198176744

Not sure how well it works for core, but might be worth adding for experimentation.

SirTyson · 2026-01-28T21:28:16Z

.claude/skills/running-make-to-build/SKILL.md

+- NEVER run from a subdirectory
+- NEVER run with `make -C <somedir>` for any other directory
+- NEVER run `cargo` manually, let `make` run it
+- NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am`


I've found it helpful to include instructions for autogen and configure. Otherwise, it gets really confused by build errors encountered when adding a new file, maybe something like:

ALWAYS stage new files when created for the first time
ALWAYS run ./autogen.sh && --enable-ccache --enable-sdfprefs --enable-tracy --enable-tracy-gui --enable-tracy-capture (my personal default flags) when adding new files initially
NEVER run ./atuogen.sh or ./configure unless you have created a new file.

Minor, but it is a little faster to use make-mks instead of autogen and configure when adding new files.

SirTyson · 2026-01-28T21:34:12Z

.claude/skills/low-level-code-review/SKILL.md

+- **Unused variables**: Variables declared but never used.
+- **Unused includes**: Headers included but nothing from them appears to be used
+  in the changed code.
+- **Dead code**: Code that can never execute (after unconditional return, etc.).


Maybe east-const here? I'm not sure where to put it since I haven't used lower level skills like this before, but I have a style guide in my agents .md with a few things like RAII always, only smart pointers, no raw pointers, east const, C++17.

Since there was a lot more feedback than I anticipated on #5111 I've split out the most important step of it for now: fixing the devcontainer to actually work with noble / support our current build. The devcontainer literally doesn't work currently, so this is fairly high value to land asap.

SirTyson · 2026-02-02T21:25:58Z

Sorry, I was testing with this locally and accidentally pushed something here instead of to my fork, should be reverted now.

Copilot AI review requested due to automatic review settings January 28, 2026 04:07

Copilot started reviewing on behalf of graydon January 28, 2026 04:08 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

Fix devcontainer to work with noble

2fdcc24

graydon force-pushed the copilot-improvements branch 2 times, most recently from 29a76d8 to 9c2a077 Compare January 28, 2026 05:46

leighmcculloch previously approved these changes Jan 28, 2026

View reviewed changes

.github/copilot-instructions.md Show resolved Hide resolved

dmkozh reviewed Jan 28, 2026

View reviewed changes

graydon dismissed leighmcculloch’s stale review via 54b30ec January 28, 2026 19:21

Rewrite copilot-instructions, add some skills.

de620a9

graydon force-pushed the copilot-improvements branch from 54b30ec to de620a9 Compare January 28, 2026 19:58

marta-lokhova reviewed Jan 28, 2026

View reviewed changes

SirTyson reviewed Jan 28, 2026

View reviewed changes

graydon mentioned this pull request Jan 30, 2026

Fix devcontainer to work with noble #5115

Merged

SirTyson force-pushed the copilot-improvements branch from 1dda725 to de620a9 Compare February 2, 2026 21:24


		## Step 4: Check for Randomized Test Needs

		For changes affecting:

Conversation

graydon commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marta-lokhova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marta-lokhova Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marta-lokhova Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SirTyson Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SirTyson commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

marta-lokhova Jan 28, 2026 •

edited

Loading

marta-lokhova Jan 28, 2026 •

edited

Loading

SirTyson Jan 28, 2026 •

edited

Loading