Your app's logic lives in an entrypoint file — inference.py (Python) or src/inference.js (Node.js).
Structure
1from inferencesh import BaseApp, BaseAppInput, BaseAppOutput2from pydantic import Field34class AppSetup(BaseAppInput):5 model_id: str = Field(default="gpt2", description="Model to load")67class AppInput(BaseAppInput):8 # Define inputs here9 pass1011class AppOutput(BaseAppOutput):12 # Define outputs here13 pass1415class App(BaseApp):16 async def setup(self, config: AppSetup):17 # Runs once when worker starts or config changes18 pass1920 async def run(self, input_data: AppInput, metadata) -> AppOutput:21 # Runs for each request22 pass2324 async def unload(self):25 # Runs on shutdown26 passDefining inputs
1class AppInput(BaseAppInput):2 prompt: str = Field(description="What to generate")3 style: str = Field(default="modern", description="Style to use")4 count: int = Field(default=1, description="How many to generate")Field types
| Type | Python | Node.js (Zod) |
|---|---|---|
| Text | str | z.string() |
| Number | int / float | z.number() |
| Boolean | bool | z.boolean() |
| File | File | z.object({ path: z.string() }) |
| Optional | Optional[T] | z.string().optional() |
| Array | List[T] | z.array(z.string()) |
| Enum | Literal["a", "b"] | z.enum(["a", "b"]) |
Defining outputs
1class AppOutput(BaseAppOutput):2 result: str = Field(description="Generated text")3 image: File = Field(description="Generated image")The run method
This is where your logic goes:
1async def run(self, input_data: AppInput, metadata) -> AppOutput:2 # Log progress3 metadata.log("Processing...")45 # Do work6 result = process(input_data.prompt)78 # Return output9 return AppOutput(result=result)Setup for models
Load heavy resources in setup. Use setup schemas to define configurable parameters:
1class AppSetup(BaseAppInput):2 model_id: str = Field(default="gpt2", description="Model to load")3 precision: str = Field(default="fp16", description="Model precision")45class App(BaseApp):6 async def setup(self, config: AppSetup):7 from transformers import AutoModel8 self.model = AutoModel.from_pretrained(config.model_id)This runs once per configuration. If setup values change between requests, the app re-initializes.
Multi-function apps
Apps can expose multiple functions, each with their own input/output types.
1from pydantic import BaseModel23class GreetInput(BaseModel):4 name: str = "World"56class GreetOutput(BaseModel):7 message: str89class ReverseInput(BaseModel):10 text: str1112class ReverseOutput(BaseModel):13 reversed_text: str1415class App:16 async def run(self, input_data: GreetInput) -> GreetOutput:17 return GreetOutput(message=f"Hello, {input_data.name}!")1819 async def reverse(self, input_data: ReverseInput) -> ReverseOutput:20 return ReverseOutput(reversed_text=input_data.text[::-1])Python: Functions are discovered automatically if they have type hints using Pydantic models.
Node.js: Functions are discovered automatically if matching {PascalName}Input and {PascalName}Output Zod schemas are exported.
Calling functions
1curl -X POST https://api.inference.sh/v1/apps/{app_id}/run \2 -d '{"function": "reverse", "input": {"text": "hello"}}'Working with files
Input files are downloaded for you:
1image_path = input_data.image.pathOutput files are uploaded for you:
1return AppOutput(image=File(path="/tmp/output.png"))