Conversation
There was a problem hiding this comment.
Pull request overview
This pull request makes three key improvements to the stellar-core development environment and tooling: updates the devcontainer configuration for Ubuntu 24.04 compatibility, restructures the Copilot instructions to emphasize using tools and skills, and introduces eight comprehensive skills for code development, review, testing, and validation workflows.
Changes:
- Updated devcontainer configuration from Ubuntu 20.04 to 24.04 (noble) with modern VSCode customizations and development tool extensions
- Rewrote Copilot instructions to guide agents toward using LSP tools, subagents, and skill-based workflows rather than directly providing coding guidelines
- Added eight new skill documents covering validation, testing, building, code review (high and low level), test addition, build configuration, and skill creation
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/copilot-instructions.md | Rewrites instructions to emphasize agent-based workflows and tool usage instead of direct coding guidelines |
| .devcontainer/devcontainer.json | Updates container name to Ubuntu 24.04, modernizes customizations structure, and adds development extensions |
| .devcontainer/Dockerfile | Updates for Ubuntu 24.04 sources, adds user/group handling logic, and installs additional development tools |
| .claude/skills/validating-a-change/SKILL.md | Defines comprehensive validation workflow orchestrating multiple review and testing steps |
| .claude/skills/running-tests/SKILL.md | Provides detailed test execution guidance from smoke tests through sanitizer builds |
| .claude/skills/running-make-to-build/SKILL.md | Documents build system structure and correct usage patterns |
| .claude/skills/low-level-code-review/SKILL.md | Establishes mechanical code review process for localized issues |
| .claude/skills/high-level-code-review/SKILL.md | Defines semantic code review criteria for correctness and design consistency |
| .claude/skills/configuring-the-build/SKILL.md | Documents build configuration options and autotools workflow |
| .claude/skills/adding-tests/SKILL.md | Guides test analysis and creation for various change types |
| .claude/skills/adding-a-new-skill/SKILL.md | Provides meta-framework for creating additional skills |
29a76d8 to
9c2a077
Compare
|
|
||
| Example: | ||
| ```bash | ||
| ./stellar-core test --ll fatal -r simple --disable-dots --abort "test name" |
There was a problem hiding this comment.
Did you have success with agents running tests consistently using this skill? I wonder if a short version of this should belong to instructions for better discoverability.
There was a problem hiding this comment.
I would not say "consistently", no. I'll put it in instructions.
|
|
||
| # Overview | ||
|
|
||
| The build works like this: |
There was a problem hiding this comment.
I wonder if we need to be so verbose here. Wouldn't just specifying a common command line work? I also wonder if that command line could just go to instructions. I'm fairly skeptical of a need to do something complex with the build, using a single fixed command line would probably be sufficient for local development, and re-configuring builds is generally rare.
There was a problem hiding this comment.
I agree makes sense to hoist the "most likely" configure / make / run-one-test / make check in parallel commands up to the instructions file. I figured I'd tell them some details on other parts of the build system since they're not uncommon in my experience -- I switch compilers or configure flags fairly often -- but there's no need for them to dig into such a file the majority of the time. I'll make this adjustment.
(Also it seems like I was a bit too optimistic about the ability of subagents to execute complex steps on their own, they have a tendency to just wander off; I will alter the language around them)
54b30ec to
de620a9
Compare
marta-lokhova
left a comment
There was a problem hiding this comment.
Couple of suggestions -- feel free to take it or leave it. These are some things I found to work ok locally. Some of these are more explicit wordings of what you already have. That being said, no rigorous evaluation has been done on any of these instructions, all evidence is purely anecdotal.
|
|
||
| - Ensure functional tests still pass | ||
| - Consider adding benchmark tests if appropriate | ||
| - Consider adding metrics or tracy ZoneScoped annotations to help |
There was a problem hiding this comment.
I'd also specifically call out LogSlowExecution helper to flag perf degradations.
|
|
||
| ## Step 4: Check for Randomized Test Needs | ||
|
|
||
| For changes affecting: |
There was a problem hiding this comment.
Just curious, have you found the adding-tests skill to work well with more complex changes that involve flows between multiple subsystems or complex places like SCP or Ledger state management? (I've seen very inconsistent results).
| in this codebase: | ||
|
|
||
| 1. **Test fixture setup**: Look for how test fixtures are created | ||
| 2. **Assertion style**: Match the assertion macros used elsewhere |
There was a problem hiding this comment.
May be worth being a bit more explicit here, since I've seen Claude consistently silently fail tests with warnings and early returns (instead of crashing). For agentic stuff, I also found it helpful to always use short timeouts because sometimes half-broken code loops until cli times out (this wastes a couple of minutes on each iteration). Example checks:
### 8. Test Strictness
- [ ] Do my tests have timeouts? (no infinite hangs)
- [ ] Do my tests assert on failure? (no silent failures or early returns)
- [ ] Am I using logging where I should use assert/panic?
There was a problem hiding this comment.
re timeouts: can also add an explicit "wrap each stellar-core test with timeout"
| 3. **Build an understanding of context**. Use LSP tools to get lists of symbols | ||
| and call graphs and type hierarchies as necessary. For each modified file, | ||
| read enough of the surrounding code to understand the context (at minimum, | ||
| the entire function or class being modified). |
There was a problem hiding this comment.
Sort of related, but I've found it useful to force agents to never make assumptions (e.g. it will think it understands the problem and context, but with a lot of assumptions, which most of the time are incorrect)
Am I assuming requirements, constraints, or behavior? → ASK FIRST
Am I guessing how something works instead of reading code/docs? → READ FIRST
| - Are errors detected and reported appropriately? | ||
| - Are error messages clear and actionable? | ||
| - Is error recovery handled correctly? | ||
| - Are exceptions used appropriately (if at all)? |
There was a problem hiding this comment.
This might be saying similar things differently, but I found it useful to specify explicitly that we expect the program to correctly crash vs throw an exception vs catch the exception depending on the subsystem we're dealing with. Another few bullets that might be helpful:
1. **Crash Recovery & Durability** - Can data be corrupted or lost?
2. **Atomicity & Consistency** - What state is left after partial failure?
3. **Correct Usage of APIs** - Are new/modified interfaces used correctly everywhere?
| - Are error messages clear and actionable? | ||
| - Is error recovery handled correctly? | ||
| - Are exceptions used appropriately (if at all)? | ||
|
|
There was a problem hiding this comment.
Also, I'm tempted to add a section on fsync and file system stuff specifically, just because it's been such a pain, but I'm not sure if it's too specific...
**Atomic file patterns:**
- Look for write-to-temp + rename patterns
- Verify temp file is fsynced before rename
- Verify directory is fsynced after rename
- Check: what happens if crash occurs between steps?
**Questions to ask:**
- If power is lost mid-operation, what state is on disk?
- Can the operation be resumed/retried after crash?
- Are there ordering dependencies between file writes?
| - **Double-free**: Freeing or deleting a resource twice. | ||
| - **Boolean logic errors**: Wrong operator precedence, De Morgan's law mistakes, | ||
| inverted conditions. | ||
| - **String formatting mismatches**: Format specifier doesn't match argument type |
There was a problem hiding this comment.
Should we add misuse of C++ containers and standard library in general? (like wrong params, UB, etc)
| - NEVER run from a subdirectory | ||
| - NEVER run with `make -C <somedir>` for any other directory | ||
| - NEVER run `cargo` manually, let `make` run it | ||
| - NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am` |
There was a problem hiding this comment.
It also helps to run make command with | tail -N to avoid dumping a gigantic make output into context.
| 2. Adding any missing tests | ||
| 3. Running tests | ||
| 4. Multiple configurations (catches config-specific issues) | ||
| 5. Formatting last (mechanical cleanup) |
There was a problem hiding this comment.
@MonsieurNicolas recently pointed out a cool trick to produce clean commit history without changing the code, https://x.com/steveruizok/status/2014095245198176744
Not sure how well it works for core, but might be worth adding for experimentation.
| - NEVER run from a subdirectory | ||
| - NEVER run with `make -C <somedir>` for any other directory | ||
| - NEVER run `cargo` manually, let `make` run it | ||
| - NEVER edit `Makefile` or `Makefile.in`, only ever edit `Makefile.am` |
There was a problem hiding this comment.
I've found it helpful to include instructions for autogen and configure. Otherwise, it gets really confused by build errors encountered when adding a new file, maybe something like:
ALWAYS stage new files when created for the first time
ALWAYS run ./autogen.sh && --enable-ccache --enable-sdfprefs --enable-tracy --enable-tracy-gui --enable-tracy-capture (my personal default flags) when adding new files initially
NEVER run ./atuogen.sh or ./configure unless you have created a new file.
There was a problem hiding this comment.
Minor, but it is a little faster to use make-mks instead of autogen and configure when adding new files.
| - **Unused variables**: Variables declared but never used. | ||
| - **Unused includes**: Headers included but nothing from them appears to be used | ||
| in the changed code. | ||
| - **Dead code**: Code that can never execute (after unconditional return, etc.). |
There was a problem hiding this comment.
Maybe east-const here? I'm not sure where to put it since I haven't used lower level skills like this before, but I have a style guide in my agents .md with a few things like RAII always, only smart pointers, no raw pointers, east const, C++17.
Since there was a lot more feedback than I anticipated on #5111 I've split out the most important step of it for now: fixing the devcontainer to actually work with noble / support our current build. The devcontainer literally doesn't work currently, so this is fairly high value to land asap.
1dda725 to
de620a9
Compare
|
Sorry, I was testing with this locally and accidentally pushed something here instead of to my fork, should be reverted now. |
This PR does 3 main things:
I don't think there's much to object to in the skills and I am also not all that certain that copilot will use them as actively as we'd like; on the other hand it roughly seems to work with my local testing and enables some behaviors that I was otherwise telling it over and over / watching it do wrong. so I think it's probably worth starting off with something like this. if we find them making things worse they can always be edited back down / removed.