Gen AI on AKS

This is going to be built with maximum security. We will use Manged NGINX ingress controller with SSL termination fronted by App Gateway.
Both inferencing and embedding API's would be exposed by API management.

NOTE: To Deploy N series GPUs, one needs approval to enable N series on VMs. See Here

Architecture use case

Automated scaling of AKS cluster based on the load
Resource management and cost optimization as compared to PTUs
HA through Self-healing
Edge computing with inferencing at the edge
Secure and compliant with data residency requirements=
Streamlined deployment and management
Observability and monitoring
Enable mtls between APIM and NGINX can be implemented using
install the Nvidia device plugin for kubernetes k8s-device-plugin
install KubeRay for distributed inference
To view Dashboard:

kubectl port-forward service/${RAYCLUSTER_NAME}-head-svc 8265:8265

STEPS to deploy

Create a resource group

az group create --name <your-resource-group-name> --location <your-location>

Create the infrastructure using bicep

az deployment group create --resource-group <your-resource-group-name> --template-file init.bicep

Connect to the AKS cluster

az aks get-credentials --resource-group <your-resource-group-name> --name <your-aks-cluster-name>

Install the Nvidia device plugin for Kubernetes

kubectl apply -f nvidia-device-plugin.yml

Install KubeRay for distributed inference

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
# Install both CRDs and KubeRay operator v1.3.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.3.0

Deploy the LLM

kubectl apply -f raysvc-llama3-8b-A100.yaml

Test the deployment

kubectl port-forward svc/<NAME> 8000

$ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "meta-llama/Meta-Llama-3-8B-Instruct",
      "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Provide a brief sentence describing the Ray open-source project."}
      ],
      "temperature": 0.7
    }'

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Samples		Samples
backend		backend
iac		iac
infrastructure		infrastructure
k8s		k8s
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gen AI on AKS

Architecture use case

STEPS to deploy

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jimmyshah83/aks-ai-accelerator

Folders and files

Latest commit

History

Repository files navigation

Gen AI on AKS

Architecture use case

STEPS to deploy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages