Deci AI (Acquired by NVIDIA)

Deci AI (Acquired by NVIDIA)

Software Development

Deci enables deep learning to live up to its true potential by using AI to build better AI.

About us

Deci enables deep learning to live up to its true potential by using AI to build better AI. With the company’s end-to-end deep learning acceleration platform, AI developers can build, optimize, and deploy faster and more accurate models for any environment, including cloud, edge, or mobile. With Deci’s platform, developers can increase deep learning model inference performance by 3x-15x, on any hardware, while still preserving accuracy. This translates directly into new use cases on limited hardware, substantially shorter development cycles, and reduced compute costs by up to 80%.

Industry
Software Development
Company size
51-200 employees
Headquarters
Tel Aviv
Type
Privately Held
Founded
2019
Specialties
Deep Learning, Machine Learning, and Artificial intelligence

Products

Locations

Employees at Deci AI (Acquired by NVIDIA)

Updates

  • Are you keeping up with our 11 Days of Inference Acceleration Techniques? Here's the next one: 🤖 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐞 𝐝𝐞𝐞𝐩 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐦𝐨𝐝𝐞𝐥 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐧𝐞𝐮𝐫𝐚𝐥 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 𝐬𝐞𝐚𝐫𝐜𝐡 (𝐍𝐀𝐒). NAS is a technique that automatically selects an appropriate architecture from a space of allowable architectures by using a search strategy that depends on an objective evaluation scheme. NAS can produce outstanding results, but it’s challenging to implement effectively. A NAS search requires the evaluation of many architectures for their validation set performance; this demands a massive amount of computational power. Many ideas have been considered to reduce evaluation time, including low-fidelity performance estimation, weight inheritance, weight sharing, learning-curve extrapolation, network morphism, and single-shot training, which is one of the most popular options. Learn more about NAS and how to address its limitations from Eitan Fredman Ganeles below or here > https://lnkd.in/gHYYn93c – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • ☝️ From tips 1-5, there’s an assumption that the model is fixed, but it gets even more complicated when you can change the model. On Day 6, we’re exploring model level optimizations, which include any method that changes the model or architecture itself. Since neural networks often contain many redundancies, it’s essential to exploit or streamline these redundancies to accelerate the inference process. What can you do to optimize inference at the model level?   𝐒𝐢𝐱𝐭𝐡 𝐭𝐢𝐩: 𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬 𝐦𝐨𝐝𝐞𝐥𝐬 𝐰𝐢𝐭𝐡 𝐦𝐨𝐝𝐞𝐥 𝐝𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧. 🧑🏫 Widely applicable in various AI domains, including NLP, speech recognition, visual recognition, and recommendation systems, model distillation is a training technique that trains small models to be as accurate as larger models by transferring knowledge. 🔤 A classic example of model compression can be seen in various BERT models that employ knowledge distillation to compress their large deep models into lightweight versions of BERT. For instance, BERT-PKD, TinyBERT, DistilBERT, and BERT-BiLSTM-based model compression techniques solve multilingual tasks using lightweight language models. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • We’re at day 5 of actionable inference acceleration techniques for your deep learning applications. 𝐓𝐨𝐝𝐚𝐲, 𝐰𝐞’𝐫𝐞 𝐫𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐮𝐬𝐞 𝐨𝐟 𝐪𝐮𝐚𝐧𝐭𝐢𝐳𝐚𝐭𝐢𝐨𝐧. 🏋️ Quantization refers to the process of reducing the numerical representation (bit-width) of weights and activations and can be used to speed up runtime if it's supported by the underlying hardware. However, there two important points to consider: 1️⃣ Not all models will quantize equally. 2️⃣ Some models respond better than others to quantization. Quantization can result in degradation in the model’s accuracy. Low bit-width weights and activations result in information loss, distort the network representation, and circumvent the normal differentiation required for training. While the second point is not much of an issue when going from FP32 to FP16, today’s quantization techniques use INT8 and even INT4. In such levels of quantization, there’s often a 1% to 10% loss in accuracy post-quantization. The level of accuracy loss mostly depends on the model architecture and its ability to quantize. 💡 The solution: quantization-aware training and post-quantization training. Learn more here > https://lnkd.in/gYvKPJBJ – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • It’s day 4 of our 11 Days of Inference Acceleration Techniques. Today, we’re moving on to runtime level optimization best practices. 𝐅𝐨𝐮𝐫𝐭𝐡 𝐭𝐢𝐩: 𝐓𝐚𝐤𝐞 𝐚𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞 𝐨𝐟 𝐠𝐫𝐚𝐩𝐡 𝐜𝐨𝐦𝐩𝐢𝐥𝐚𝐭𝐢𝐨𝐧 📈 Graph compilers such as TVM, Tensor-RT, and OpenVino work by getting a computation graph of a specific model and generating an optimized code adjusted for the target hardware. Graph compilation can optimize the graph structure by merging redundant operations, performing kernel auto-tuning, enhancing memory reuse, preventing cache misses, and more. But few things to be aware of: 📝 Not all models compile equally. 📝 The impact of compilation on model performance can vary. Hence, make sure to check the architecture's ability to compile early on in the process to avoid wasting time and resources on training a model that can’t be optimized for fast inference. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • Our third inference acceleration technique recommendation is pretty straightforward: ⚙️ 𝐂𝐨𝐧𝐬𝐢𝐝𝐞𝐫 𝐚𝐝𝐨𝐩𝐭𝐢𝐧𝐠 𝐚 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞 𝐚𝐧𝐝 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐚𝐰𝐚𝐫𝐞 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐭𝐨 𝐦𝐨𝐝𝐞𝐥 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭. Production-aware model development takes into account the inference hardware very early in the development stage, particularly, during model selection. Also known as hardware in a loop development approach, it considers the hardware the deep learning models are going to be deployed on, the most suitable software stack for the target hardware, and the limitations. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • It’s day 2 of our 11 Days of Inference Acceleration Techniques. Today, we’re sharing another tip related to hardware. ⚙️ 𝐊𝐧𝐨𝐰 𝐚𝐧𝐝 𝐠𝐞𝐭 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐨𝐮𝐭 𝐨𝐟 𝐲𝐨𝐮𝐫 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. In situations where the inference hardware is already defined, you have to make sure you're aware of its various computation parts. Nowadays, almost every hardware has more than one accelerator that can run deep learning algorithms. For example, an iPhone 12 consists of a CPU, a GPU, and an Apple Neural Accelerator Engine (ANE). Running an EfficientFormer-L1 on iPhone 12, it can get 20x the throughput on ANE than GPU, making ANE the much more power efficient choice. 🧰 Once you know the ins and outs of your hardware, you can ensure that you are getting the most out of it by employing the right algorithms and co-designing both hardware and software in the development of the architecture for your specific use case. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • 📊 Inference is critical to the entire user experience, especially for applications requiring strict performance metrics such as autonomous vehicles and healthcare. The scale of inference is usually much larger than that of training, and the demand for cost-efficient compute will grow. The more AI applications are being deployed in the real-world, the more focus is going to be on inference. For 11 days, we’re sharing techniques and best practices that can help AI teams improve the inference performance of deep learning applications. 💡 𝐅𝐢𝐫𝐬𝐭 𝐭𝐢𝐩: 𝐁𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤 𝐲𝐨𝐮𝐫 𝐦𝐨𝐝𝐞𝐥𝐬 𝐨𝐧 𝐯𝐚𝐫𝐢𝐨𝐮𝐬 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞 Consider the hardware you have at hand and ensure that you have the best model for it. And if you’re using a hardware family, you need a model that works well on all. Moreover, the correlation between best performing models and hardware isn’t linear. Models perform differently on different hardware. So, benchmarking models enables you to understand how models behave on different hardware and allows you to choose the best one for your use case. – What’s the #11DaysofInferenceAccelerationTechniques? The Deci team is posting, for 11 days, a series of inference acceleration techniques for deep learning applications. If you’re looking for practical tips and best practices for improving inference, follow Deci AI so you won’t miss an update. #deeplearning #machinelearning #neuralnetworks #computervision

  • 📈 LLM evaluation methods fall into two main categories: automatic and human-in-the-loop. Automatic evaluations comprise benchmarks and LLM-as-a-Judge methods. Human-in-the-loop evaluations include informal vibe checks, where individuals test LLMs to gauge initial impressions. In this guide, discover the pros and cons of four fundamental LLM evaluation methods, equipping you to choose the right evaluation strategy for your LLM needs. 👉 Read now > https://lnkd.in/g6Qgy9vk #largelanguagemodels #llms #generativeai

  • 🔎 A small object detection model, YOLO-NAS-Sat delivers an accuracy-latency trade-off that outperforms established models. Its performance is attributable to its architecture, generated by AutoNAC, Deci’s Neural Architecture Search engine. YOLO-NAS-Sat is part of a broader family of highly efficient foundation models, including YOLO-NAS for standard object detection, YOLO-NAS Pose for pose estimation, and DeciSegs for semantic segmentation, all generated through AutoNAC. Learn more about these foundation models here > https://lnkd.in/gSDi-f2a #computervision #objectdetection #yolo #neuralnetworks #deeplearning

Similar pages

Funding

Deci AI (Acquired by NVIDIA) 5 total rounds

Last Round

Series B

US$ 25.0M

See more info on crunchbase