Cover Image for Finetuning to Inference: Serve Open Models with Tensorfuse & AWS
Cover Image for Finetuning to Inference: Serve Open Models with Tensorfuse & AWS

Finetuning to Inference: Serve Open Models with Tensorfuse & AWS

Hosted by Agam Jain, Samagra Sharma & Omar Valle
Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Join Tensorfuse and AWS at the GenAI Loft in San Francisco for a hands-on workshop on fine-tuning and deploying open source LLMs like Llama, Qwen, Deepseek, in your AWS VPC.

Unlike other events, this will be a co-build session where you will:

  • Use Tensorfuse to provision an EKS cluster in your AWS VPC optimised for running AI workloads.

  • Customise open source LLMs by fine-tuning LoRA adapters

  • Deploy the fine-tuned model and run inference using vLLM

Best Part: For this workshop, we will be using readily available on-demand GPUs like A10G and L40S on AWS.

Agenda

  • 2:00-2:30 p.m. | Check-in and brief about Tensorfuse

  • 2:30-4:30 p.m. | Deploy-athon: We will configure the Tensorfuse runtime in your AWS account and fine-tune and deploy Llama models

  • 4:30-5:00 p.m | Networking, Snacks and Drinks!

Pre-requisite

Everyone interested in customizing and self-hosting LLMs is welcome. However, you’ll get the most from this event if you:

  • Have an active AWS account

  • Have sufficient on-demand GPU quotas for G and VT instances (ideally in us-east-1 or us-west-2). Check your quotas here

  • Permissions to configure resources in your AWS.

We have limited spots (~100), book your seat now.

Excited to host you all at the GenAI Loft in SF!

Location
525 Market St
San Francisco, CA 94105, USA
AWS Office, 19th Floor