TensorFuse (YC W24)

TensorFuse (YC W24)

Software Development

San Francisco, California 783 followers

Auto-evaluate your production LLM applications.

About us

Run serverless GPUs on private cloud

Website
https://tensorfuse.io/
Industry
Software Development
Company size
2-10 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2023

Locations

Employees at TensorFuse (YC W24)

Updates

  • TensorFuse (YC W24) reposted this

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    Lately, I’ve been writing a lot about running serverless GPUs. However, very few people actually understand how hard it is to build it. Here’s something we are doing under the hood: 1. Rewriting our own Docker from scratch 2. Rewriting our own file system 3. Making it compatible with K8S and auto-scalers like Karpenter, Knative, etc. 4. All this while operating within the deep, dark forest called AWS All this to solve for cold start and bring it down to less than 5 seconds for any type of model image, from Llama3 to Stable Diffusion. If you are a systems engineer with experience in Go or Rust, come join us at Tensorfuse. We’re a lean team building the next gen of serverless computing!

  • TensorFuse (YC W24) reposted this

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    Deploying open-source models like Llama3, Mixtral, Gemma, etc. in production.. But facing difficulty with setting up Kubernetes, Karpenter, CI/CD, etc. for GPU nodes? At Tensorfuse, we've solved most of these issues. We're building a serverless runtime that operates on your own cloud (AWS/Azure/GCP). It offers the ease and speed of a serverless, along with the flexibility and control of your own infra. Here are some of our most loved features: 1. Ability to customize your environment: Specify container images and hardware specifications using simple Python, no YAML required. 2. Autoscaling: Scale GPU workers from zero to hundreds in seconds to meet user demand in real-time. 3. OpenAI compatibility: Start using your deployment on an OpenAI compatible endpoint. It took us just 30 minutes to deploy Llama3 on our own AWS account using Tensorfuse. The best part is that all of this can be achieved directly from your CLI, eliminating the need for context switching. Check out our website for more details. Link is in the comments!

    • No alternative text description for this image
  • View organization page for TensorFuse (YC W24), graphic

    783 followers

    We are building the best ML deployment experience that sits on your own cloud! Ease and speed of serverless, flexibility and control of your own infra! Stay tuned for more updates 🎉

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    New Milestone: TensorFuse (YC W24) got featured at the Times Square NYC! Run serverless GPUs on your private cloud (AWS/Azure/GCP) Thanks to Brex for the shoutout! That animation with our logo is pretty cool

  • TensorFuse (YC W24) reposted this

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    How to Increase GPU Quotas in AWS? At TensorFuse (YC W24), we often receive questions about how to increase GPU quota limits in AWS which are critical for scaling ML workloads. This post will guide you in understanding the different types of ML instances in AWS and demonstrate how to increase their quota limits. 1️⃣ Understanding EC2 Instances: AWS offers a variety of EC2 instances tailored for different types of workloads. For ML tasks, especially deep learning, "Accelerated Computing Instances" are ideal due to their GPU capabilities. Examples include P3, P4, and G4 instances. 2️⃣ Estimating Service Quota Limit: To avoid potential availability issues, it's critical to apply for quota increases for various instance types across multiple regions, even if you already have the quotas. You'll need to calculate your service quota limit based on your specific workload requirements. For instance, you'll only require one p4d.24xlarge instance, but six g5.4xlarge instances to load the 8-bit quantized DBRX model. 3️⃣ Automated Python Script: Here's where the magic happens! We've created a Python script that automates the application for service quota increases across different regions and instance types using the AWS SDK (Boto3). This script will save you time and effort compared to manually applying via the AWS console or CLI. Find the script here: https://lnkd.in/gQcFkPNi If you want us to do the same for Azure and GCP, leave a comment below. Happy computing!

    Increase GPU Quota on AWS with Automated Python Script - Tensorfuse

    Increase GPU Quota on AWS with Automated Python Script - Tensorfuse

    tensorfuse.io

  • TensorFuse (YC W24) reposted this

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    Run serverless GPUs on private cloud - TensorFuse (YC W24) I'm interested in learning about the use of serverless GPU deployments for AI workloads across platforms like as Modal, Cerebrium, banana.dev, etc. Are there teams considering transitioning their serverless GPU deployments to AWS in the future? This could be due to various reasons, such as cost considerations or trust in major cloud providers. We're developing a serverless GPU stack that operates on your own cloud/infrastructure and I'd like to discuss with individuals who share similar perspectives. Feel free to DM me or drop an email at [email protected]

  • TensorFuse (YC W24) reposted this

    View profile for Agam Jain, graphic

    Founder at TensorFuse (YC W24)

    Introducing Tensorfuse Deployments 🚀 With the latest update from Tensorfuse, you can now deploy and scale LLM pipelines on your own cloud in a few easy steps: - Connect your cloud provider (AWS, GCP, Azure, etc.) - Select the OSS model and machine type Then, click deploy! Your model is immediately available for inference via an OpenAI compatible API. Here's what our customers have accomplished with Tensorfuse Deployments: - Reduced their LLM development cycle time by 𝟵𝟬% - Optimized the use of their compute resources across cloud providers like AWS, Azure, and GCP - Ensured 𝗱𝗮𝘁𝗮 𝗽𝗿𝗶𝘃𝗮𝗰𝘆 𝗮𝗻𝗱 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 by deploying on their own cloud, without sending any data to third-party providers. If you're planning to deploy LLM pipelines and want to avoid sending data to a third party, Tensorfuse can help you deploy LLM pipelines on your own infrastructure. If you're interested, email me at [email protected]. For more information, check out the demo video below.

  • View organization page for TensorFuse (YC W24), graphic

    783 followers

    RAG pipelines are everywhere and a lot of people are deploying these pipelines in production. However, after speaking with numerous companies, we have come to realize that building a naive RAG system is easy but improving it and making it production grade is super hard. At our previous companies, we have deployed numerous RAG systems (back then RAGs were called Natural Language Search) and we would like to share our insights here. In this article, I explore how to improve the retrieval subsystem of RAG pipelines. I discuss common issues that can occur at the retrieval stage and provide practical solutions. Subscribe to our blog to stay up-to-date on the ever-evolving LLM Landscape

    From Naive RAGs to Advanced: Improving your Retrieval

    From Naive RAGs to Advanced: Improving your Retrieval

    blog.tensorfuse.io

Similar pages

Funding

TensorFuse (YC W24) 1 total round

Last Round

Pre seed

US$ 500.0K

Investors

Y Combinator
See more info on crunchbase