Devices Supported: wormhole_b0
tt-npe simulates the behavior of an abstract NoC "workload" running on an
simulated Tenstorrent device. A workload corresponds closely to a trace of all
calls to the dataflow_api (i.e. noc_async_read, noc_async_write, ...).
tt-npe can also act as a profiler/debugger for NoC traces, integrating with tt-metal profiler's device noc trace capture feature and the ttnn-visualizer's new NPE mode. See Profiler Mode section below for more information!
tt-npe can work with both
- Predefined workloads defined in files
- noc trace JSON files extracted using tt-metal profiler's noc event capture feature
- workload files with a simplified format more amenable to generation by other tools
- Programmatically constructing a workload using Python bindings
tt-npe-demo.mov
git clone git@github.com:tenstorrent/tt-npe.git
cd tt-npe/
./build-npe.shcd tt-npe/ && git pull && ./build-npe.shNote
ENV_SETUP must be source'd after sourcing tt-metal Python virtualenv setup
source ENV_SETUPtt_metal device profiler can collect detailed traces of all noc events for
analysis by tt-npe. This will work out of the box for regular ttnn
models/ops. Pure tt_metal executables must call
tt::tt_metal::DumpDeviceProfileResults().
tt_metal/tools/profiler/profile_this.py --collect-noc-traces -c 'pytest command/to/trace.py' -o output_dirtt-npe data should be automatically added to the ops perf report CSV in
output_dir/reports/. The new columns corresponding to tt-npe data are 'DRAM BW UTIL' and 'NOC UTIL'.
Additionally, the raw noc traces are dumped to output_dir/.logs/, and can be
further analyzed without additional profiler runs.
See noc trace format for more details on the format of the noc traces.
npe_analyze_noc_trace_dir.py my_output_directory/.logs/ -ettnn-visualizer JSON inputs are dumped to subdir output_dir/npe_stats/. Note
these simulation timeline files are also JSON format files, but different
than noc trace JSON.
See ttnn-visualizer for more details on installation and use.
Everything is installed to tt-npe/install/, including:
- Shared library (
install/lib/libtt_npe.so) - Headers C++ API (
install/include/) - Python CLI using pybind11 (
install/bin/tt_npe.py) - C++ CLI (
install/bin/tt_npe_run)
tt-npe has two unit test suites; one for C++ code and one for Python.
$ tt_npe/scripts/run_ut.sh # can be run from any pwd
Run the following to add tt-npe install dir to your $PATH:
cd tt-npe/
source ENV_SETUP # add <tt_npe_root>/install/bin/ to $PATH Now run the following:
tt_npe.py -w tt_npe/workload/example_wl.jsonNote: the -w argument is required, and specifies the JSON workload file to load.
Bandwidth derating caused by congestion between concurrent transfers is
modelled by default. Congestion modelling can be disabled using
--cong-model none.
The -e option dumps detailed information about simulation timeline (e.g.
congestion and transfer state for each timestep) into a JSON file located at
npe_stats.json (by default). Future work is to load this data into a
visualization tool, but it could be used for ad-hoc analysis as well.
See tt_npe.py --help for more information about available options.
tt-npe workloads are comprimised as collections of Transfers. Each Transfer
represents a series of back-to-back packets from one source to one or more
destinations. This is roughly equivalent to a single call to the dataflow APIs
noc_async_read and noc_async_write.
Transfers are grouped hierarchically (see diagram). Each workload is a
collection of Phases, and each Phase is a group of Transfers.
For most modelling scenarios, putting all Transfers in a single monolithic
Phase is the correct approach. The purpose of multiple Phases is to
express data dependencies and synchronization common in real workloads. However
full support for this is not yet complete.
See the example script install/bin/programmatic_workload_generation.py
for an annotated example of generating and simulating a tt-npe NoC workload via
Python bindings.
Open tt_npe/doc/tt_npe_pybind.html to see full documentation of the tt-npe Python API.
The C++ API requires:
- Including the header
install/include/npeAPI.hpp - Linking to the shared lib
libtt_npe.so
See the example C++ based CLI source code within tt-npe/tt_npe/cli/. This
links libtt_npe.so as a shared library, and serves as a reference for
interacting with the API.
tt-npe does not currently model the following; features with a * are being prioritized.
- *Blackhole device support
- User defined data dependencies
- Ethernet
- Multichip traffic
