Skip to content

Copilot improvements#5111

Open
graydon wants to merge 2 commits intostellar:masterfrom
graydon:copilot-improvements
Open

Copilot improvements#5111
graydon wants to merge 2 commits intostellar:masterfrom
graydon:copilot-improvements

Conversation

@graydon
Copy link
Contributor

@graydon graydon commented Jan 28, 2026

This PR does 3 main things:

  1. Fixes the devcontainer to work better with noble and upgrades its contents a bit.
  2. Rewrites the copilot-instructions.md to mostly just tell the agent to use other tools & skills.
  3. Adds a bunch of basic skills for working with core (including a skill to help write more skills).

I don't think there's much to object to in the skills and I am also not all that certain that copilot will use them as actively as we'd like; on the other hand it roughly seems to work with my local testing and enables some behaviors that I was otherwise telling it over and over / watching it do wrong. so I think it's probably worth starting off with something like this. if we find them making things worse they can always be edited back down / removed.

Copilot AI review requested due to automatic review settings January 28, 2026 04:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request makes three key improvements to the stellar-core development environment and tooling: updates the devcontainer configuration for Ubuntu 24.04 compatibility, restructures the Copilot instructions to emphasize using tools and skills, and introduces eight comprehensive skills for code development, review, testing, and validation workflows.

Changes:

  • Updated devcontainer configuration from Ubuntu 20.04 to 24.04 (noble) with modern VSCode customizations and development tool extensions
  • Rewrote Copilot instructions to guide agents toward using LSP tools, subagents, and skill-based workflows rather than directly providing coding guidelines
  • Added eight new skill documents covering validation, testing, building, code review (high and low level), test addition, build configuration, and skill creation

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/copilot-instructions.md Rewrites instructions to emphasize agent-based workflows and tool usage instead of direct coding guidelines
.devcontainer/devcontainer.json Updates container name to Ubuntu 24.04, modernizes customizations structure, and adds development extensions
.devcontainer/Dockerfile Updates for Ubuntu 24.04 sources, adds user/group handling logic, and installs additional development tools
.claude/skills/validating-a-change/SKILL.md Defines comprehensive validation workflow orchestrating multiple review and testing steps
.claude/skills/running-tests/SKILL.md Provides detailed test execution guidance from smoke tests through sanitizer builds
.claude/skills/running-make-to-build/SKILL.md Documents build system structure and correct usage patterns
.claude/skills/low-level-code-review/SKILL.md Establishes mechanical code review process for localized issues
.claude/skills/high-level-code-review/SKILL.md Defines semantic code review criteria for correctness and design consistency
.claude/skills/configuring-the-build/SKILL.md Documents build configuration options and autotools workflow
.claude/skills/adding-tests/SKILL.md Guides test analysis and creation for various change types
.claude/skills/adding-a-new-skill/SKILL.md Provides meta-framework for creating additional skills

@graydon graydon force-pushed the copilot-improvements branch 2 times, most recently from 29a76d8 to 9c2a077 Compare January 28, 2026 05:46
leighmcculloch
leighmcculloch previously approved these changes Jan 28, 2026

Example:
```bash
./stellar-core test --ll fatal -r simple --disable-dots --abort "test name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have success with agents running tests consistently using this skill? I wonder if a short version of this should belong to instructions for better discoverability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say "consistently", no. I'll put it in instructions.


# Overview

The build works like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to be so verbose here. Wouldn't just specifying a common command line work? I also wonder if that command line could just go to instructions. I'm fairly skeptical of a need to do something complex with the build, using a single fixed command line would probably be sufficient for local development, and re-configuring builds is generally rare.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree makes sense to hoist the "most likely" configure / make / run-one-test / make check in parallel commands up to the instructions file. I figured I'd tell them some details on other parts of the build system since they're not uncommon in my experience -- I switch compilers or configure flags fairly often -- but there's no need for them to dig into such a file the majority of the time. I'll make this adjustment.

(Also it seems like I was a bit too optimistic about the ability of subagents to execute complex steps on their own, they have a tendency to just wander off; I will alter the language around them)

@graydon graydon force-pushed the copilot-improvements branch from 54b30ec to de620a9 Compare January 28, 2026 19:58
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions -- feel free to take it or leave it. These are some things I found to work ok locally. Some of these are more explicit wordings of what you already have. That being said, no rigorous evaluation has been done on any of these instructions, all evidence is purely anecdotal.


- Ensure functional tests still pass
- Consider adding benchmark tests if appropriate
- Consider adding metrics or tracy ZoneScoped annotations to help
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also specifically call out LogSlowExecution helper to flag perf degradations.


## Step 4: Check for Randomized Test Needs

For changes affecting:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, have you found the adding-tests skill to work well with more complex changes that involve flows between multiple subsystems or complex places like SCP or Ledger state management? (I've seen very inconsistent results).

in this codebase:

1. **Test fixture setup**: Look for how test fixtures are created
2. **Assertion style**: Match the assertion macros used elsewhere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth being a bit more explicit here, since I've seen Claude consistently silently fail tests with warnings and early returns (instead of crashing). For agentic stuff, I also found it helpful to always use short timeouts because sometimes half-broken code loops until cli times out (this wastes a couple of minutes on each iteration). Example checks:

### 8. Test Strictness
- [ ] Do my tests have timeouts? (no infinite hangs)
- [ ] Do my tests assert on failure? (no silent failures or early returns)
- [ ] Am I using logging where I should use assert/panic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re timeouts: can also add an explicit "wrap each stellar-core test with timeout"

3. **Build an understanding of context**. Use LSP tools to get lists of symbols
and call graphs and type hierarchies as necessary. For each modified file,
read enough of the surrounding code to understand the context (at minimum,
the entire function or class being modified).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of related, but I've found it useful to force agents to never make assumptions (e.g. it will think it understands the problem and context, but with a lot of assumptions, which most of the time are incorrect)

Am I assuming requirements, constraints, or behavior? → ASK FIRST
Am I guessing how something works instead of reading code/docs? → READ FIRST

- Are errors detected and reported appropriately?
- Are error messages clear and actionable?
- Is error recovery handled correctly?
- Are exceptions used appropriately (if at all)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be saying similar things differently, but I found it useful to specify explicitly that we expect the program to correctly crash vs throw an exception vs catch the exception depending on the subsystem we're dealing with. Another few bullets that might be helpful:

1. **Crash Recovery & Durability** - Can data be corrupted or lost?
2. **Atomicity & Consistency** - What state is left after partial failure?
3. **Correct Usage of APIs** - Are new/modified interfaces used correctly everywhere?

- Are error messages clear and actionable?
- Is error recovery handled correctly?
- Are exceptions used appropriately (if at all)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm tempted to add a section on fsync and file system stuff specifically, just because it's been such a pain, but I'm not sure if it's too specific...

**Atomic file patterns:**
- Look for write-to-temp + rename patterns
- Verify temp file is fsynced before rename
- Verify directory is fsynced after rename
- Check: what happens if crash occurs between steps?

**Questions to ask:**
- If power is lost mid-operation, what state is on disk?
- Can the operation be resumed/retried after crash?
- Are there ordering dependencies between file writes?

- **Double-free**: Freeing or deleting a resource twice.
- **Boolean logic errors**: Wrong operator precedence, De Morgan's law mistakes,
inverted conditions.
- **String formatting mismatches**: Format specifier doesn't match argument type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add misuse of C++ containers and standard library in general? (like wrong params, UB, etc)

- NEVER run from a subdirectory
- NEVER run with `make -C <somedir>` for any other directory
- NEVER run `cargo` manually, let `make` run it
- NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am`
Copy link
Contributor

@marta-lokhova marta-lokhova Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also helps to run make command with | tail -N to avoid dumping a gigantic make output into context.

2. Adding any missing tests
3. Running tests
4. Multiple configurations (catches config-specific issues)
5. Formatting last (mechanical cleanup)
Copy link
Contributor

@marta-lokhova marta-lokhova Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MonsieurNicolas recently pointed out a cool trick to produce clean commit history without changing the code, https://x.com/steveruizok/status/2014095245198176744

Not sure how well it works for core, but might be worth adding for experimentation.

- NEVER run from a subdirectory
- NEVER run with `make -C <somedir>` for any other directory
- NEVER run `cargo` manually, let `make` run it
- NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am`
Copy link
Contributor

@SirTyson SirTyson Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found it helpful to include instructions for autogen and configure. Otherwise, it gets really confused by build errors encountered when adding a new file, maybe something like:

ALWAYS stage new files when created for the first time
ALWAYS run ./autogen.sh && --enable-ccache --enable-sdfprefs --enable-tracy --enable-tracy-gui --enable-tracy-capture (my personal default flags) when adding new files initially
NEVER run ./atuogen.sh or ./configure unless you have created a new file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but it is a little faster to use make-mks instead of autogen and configure when adding new files.

- **Unused variables**: Variables declared but never used.
- **Unused includes**: Headers included but nothing from them appears to be used
in the changed code.
- **Dead code**: Code that can never execute (after unconditional return, etc.).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe east-const here? I'm not sure where to put it since I haven't used lower level skills like this before, but I have a style guide in my agents .md with a few things like RAII always, only smart pointers, no raw pointers, east const, C++17.

github-merge-queue bot pushed a commit that referenced this pull request Jan 30, 2026
Since there was a lot more feedback than I anticipated on
#5111 I've split out the
most important step of it for now: fixing the devcontainer to actually
work with noble / support our current build. The devcontainer literally
doesn't work currently, so this is fairly high value to land asap.
@SirTyson SirTyson force-pushed the copilot-improvements branch from 1dda725 to de620a9 Compare February 2, 2026 21:24
@SirTyson
Copy link
Contributor

SirTyson commented Feb 2, 2026

Sorry, I was testing with this locally and accidentally pushed something here instead of to my fork, should be reverted now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants