TensorFuse (YC W24)

Software Development

San Francisco, California 783 followers

Auto-evaluate your production LLM applications.

View all 9 employees

About us

Run serverless GPUs on private cloud

Website: https://tensorfuse.io/
External link for TensorFuse (YC W24)
Industry: Software Development
Company size: 2-10 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2023

Locations

Primary

San Francisco, California 94107, US

Get directions

Employees at TensorFuse (YC W24)

See all employees

Updates

TensorFuse (YC W24) reposted this

Agam Jain

Founder at TensorFuse (YC W24)
1w
Report this post
Lately, I’ve been writing a lot about running serverless GPUs. However, very few people actually understand how hard it is to build it. Here’s something we are doing under the hood: 1. Rewriting our own Docker from scratch 2. Rewriting our own file system 3. Making it compatible with K8S and auto-scalers like Karpenter, Knative, etc. 4. All this while operating within the deep, dark forest called AWS All this to solve for cold start and bring it down to less than 5 seconds for any type of model image, from Llama3 to Stable Diffusion. If you are a systems engineer with experience in Go or Rust, come join us at Tensorfuse. We’re a lean team building the next gen of serverless computing!

9 Comments

Like Comment Share
TensorFuse (YC W24) reposted this

Agam Jain

Founder at TensorFuse (YC W24)
1mo
Report this post
Deploying open-source models like Llama3, Mixtral, Gemma, etc. in production.. But facing difficulty with setting up Kubernetes, Karpenter, CI/CD, etc. for GPU nodes? At Tensorfuse, we've solved most of these issues. We're building a serverless runtime that operates on your own cloud (AWS/Azure/GCP). It offers the ease and speed of a serverless, along with the flexibility and control of your own infra. Here are some of our most loved features: 1. Ability to customize your environment: Specify container images and hardware specifications using simple Python, no YAML required. 2. Autoscaling: Scale GPU workers from zero to hundreds in seconds to meet user demand in real-time. 3. OpenAI compatibility: Start using your deployment on an OpenAI compatible endpoint. It took us just 30 minutes to deploy Llama3 on our own AWS account using Tensorfuse. The best part is that all of this can be achieved directly from your CLI, eliminating the need for context switching. Check out our website for more details. Link is in the comments!
6 Comments

Like Comment Share
TensorFuse (YC W24)

783 followers
1mo
Report this post
We are building the best ML deployment experience that sits on your own cloud! Ease and speed of serverless, flexibility and control of your own infra! Stay tuned for more updates 🎉

Agam Jain

Founder at TensorFuse (YC W24)
1mo

New Milestone: TensorFuse (YC W24) got featured at the Times Square NYC! Run serverless GPUs on your private cloud (AWS/Azure/GCP) Thanks to Brex for the shoutout! That animation with our logo is pretty cool

Like Comment Share
TensorFuse (YC W24) reposted this

Agam Jain

Founder at TensorFuse (YC W24)
1mo
Report this post
How to Increase GPU Quotas in AWS? At TensorFuse (YC W24), we often receive questions about how to increase GPU quota limits in AWS which are critical for scaling ML workloads. This post will guide you in understanding the different types of ML instances in AWS and demonstrate how to increase their quota limits. 1️⃣ Understanding EC2 Instances: AWS offers a variety of EC2 instances tailored for different types of workloads. For ML tasks, especially deep learning, "Accelerated Computing Instances" are ideal due to their GPU capabilities. Examples include P3, P4, and G4 instances. 2️⃣ Estimating Service Quota Limit: To avoid potential availability issues, it's critical to apply for quota increases for various instance types across multiple regions, even if you already have the quotas. You'll need to calculate your service quota limit based on your specific workload requirements. For instance, you'll only require one p4d.24xlarge instance, but six g5.4xlarge instances to load the 8-bit quantized DBRX model. 3️⃣ Automated Python Script: Here's where the magic happens! We've created a Python script that automates the application for service quota increases across different regions and instance types using the AWS SDK (Boto3). This script will save you time and effort compared to manually applying via the AWS console or CLI. Find the script here: https://lnkd.in/gQcFkPNi If you want us to do the same for Azure and GCP, leave a comment below. Happy computing!

Increase GPU Quota on AWS with Automated Python Script - Tensorfuse

tensorfuse.io

4 Comments

Like Comment Share
TensorFuse (YC W24) reposted this

Agam Jain

Founder at TensorFuse (YC W24)
2mo
Report this post
Run serverless GPUs on private cloud - TensorFuse (YC W24) I'm interested in learning about the use of serverless GPU deployments for AI workloads across platforms like as Modal, Cerebrium, banana.dev, etc. Are there teams considering transitioning their serverless GPU deployments to AWS in the future? This could be due to various reasons, such as cost considerations or trust in major cloud providers. We're developing a serverless GPU stack that operates on your own cloud/infrastructure and I'd like to discuss with individuals who share similar perspectives. Feel free to DM me or drop an email at [email protected]

2 Comments

Like Comment Share
TensorFuse (YC W24) reposted this
Report this post
Agam Jain

Founder at TensorFuse (YC W24)
3mo

Introducing Tensorfuse Deployments 🚀 With the latest update from Tensorfuse, you can now deploy and scale LLM pipelines on your own cloud in a few easy steps: - Connect your cloud provider (AWS, GCP, Azure, etc.) - Select the OSS model and machine type Then, click deploy! Your model is immediately available for inference via an OpenAI compatible API. Here's what our customers have accomplished with Tensorfuse Deployments: - Reduced their LLM development cycle time by 𝟵𝟬% - Optimized the use of their compute resources across cloud providers like AWS, Azure, and GCP - Ensured 𝗱𝗮𝘁𝗮 𝗽𝗿𝗶𝘃𝗮𝗰𝘆 𝗮𝗻𝗱 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 by deploying on their own cloud, without sending any data to third-party providers. If you're planning to deploy LLM pipelines and want to avoid sending data to a third party, Tensorfuse can help you deploy LLM pipelines on your own infrastructure. If you're interested, email me at [email protected]. For more information, check out the demo video below.

9 Comments

Like Comment Share
TensorFuse (YC W24)

783 followers
5mo
Report this post
RAG pipelines are everywhere and a lot of people are deploying these pipelines in production. However, after speaking with numerous companies, we have come to realize that building a naive RAG system is easy but improving it and making it production grade is super hard. At our previous companies, we have deployed numerous RAG systems (back then RAGs were called Natural Language Search) and we would like to share our insights here. In this article, I explore how to improve the retrieval subsystem of RAG pipelines. I discuss common issues that can occur at the retrieval stage and provide practical solutions. Subscribe to our blog to stay up-to-date on the ever-evolving LLM Landscape

From Naive RAGs to Advanced: Improving your Retrieval

blog.tensorfuse.io

Like Comment Share

Funding

TensorFuse (YC W24) 1 total round

Last Round

Pre seed Mar 27, 2024

US$ 500.0K

Investors

Y Combinator

See more info on crunchbase

TensorFuse (YC W24)

Software Development

San Francisco, California 783 followers

Auto-evaluate your production LLM applications.

About us

Locations

Employees at TensorFuse (YC W24)

Agam Jain

Founder at TensorFuse (YC W24)

Samagra Sharma

Founder @ Tensorfuse (YC W24)

Vikash Sharma

Youtuber at 10k

Updates

Increase GPU Quota on AWS with Automated Python Script - Tensorfuse

tensorfuse.io

From Naive RAGs to Advanced: Improving your Retrieval

blog.tensorfuse.io

Join now to see what you are missing

Similar pages

Aerotime

camelQA

CodeAnt AI (YC W24)

Agentic Labs

Further AI

Magic Hour

Retell AI

Openmart (YC W24)

Crux (YC W24)

Datacurve (YC W24)

Funding