Skit.ai’s Post

Skit.ai

60,369 followers

2mo

Exciting news from our ML research team!

Shangeth Rajaa

Senior ML Scientist | Skit.ai | NTU Singapore | IBM Research Labs | INRIA

2mo

We at Skit.ai are thrilled to announce the release of our latest Multi-Modal LLM models for Speech Understanding on Hugging Face, along with a comprehensive GitHub repository containing the code to train and infer these models! Unlike traditional ASR + LLM systems, our multi-modal speech LLMs leverage the acoustic, semantic, prosodic, and speaker information in the speech signal to predict various attributes such as Transcript, Speech Activity, Gender, Age, Accent, and Emotion of the speaker in a conversation directly from the speech signal. The models can be further trained to generate responses based on the user's metadata in an end-to-end manner for TOD systems, eg: Apologetic responses, when the speaker appears frustrated, etc. Similar to our previous demo blog on Multi-Modal LLM for Conversational Agents - https://lnkd.in/gRaSi99X. Due to the simplicity of training the model, any new perception/generation tasks could be added to the model eg: Multi-speaker transcript, speech environment classification, speech translation, etc. 🔗 Check it out: Hugging Face Models: • speechllm-2B: https://lnkd.in/gdRAwj3U • speechllm-1.5B: https://lnkd.in/gdGA6Jzj GitHub Repository: https://lnkd.in/gu6DSvmc #MultiModalLLM #LLM #HuggingFace #GitHub #ConversationalAI

To view or add a comment, sign in

More Relevant Posts

Shangeth Rajaa

Senior ML Scientist | Skit.ai | NTU Singapore | IBM Research Labs | INRIA
2mo
Report this post
We at Skit.ai are thrilled to announce the release of our latest Multi-Modal LLM models for Speech Understanding on Hugging Face, along with a comprehensive GitHub repository containing the code to train and infer these models! Unlike traditional ASR + LLM systems, our multi-modal speech LLMs leverage the acoustic, semantic, prosodic, and speaker information in the speech signal to predict various attributes such as Transcript, Speech Activity, Gender, Age, Accent, and Emotion of the speaker in a conversation directly from the speech signal. The models can be further trained to generate responses based on the user's metadata in an end-to-end manner for TOD systems, eg: Apologetic responses, when the speaker appears frustrated, etc. Similar to our previous demo blog on Multi-Modal LLM for Conversational Agents - https://lnkd.in/gRaSi99X. Due to the simplicity of training the model, any new perception/generation tasks could be added to the model eg: Multi-speaker transcript, speech environment classification, speech translation, etc. 🔗 Check it out: Hugging Face Models: • speechllm-2B: https://lnkd.in/gdRAwj3U • speechllm-1.5B: https://lnkd.in/gdGA6Jzj GitHub Repository: https://lnkd.in/gu6DSvmc #MultiModalLLM #LLM #HuggingFace #GitHub #ConversationalAI
3 Comments
Like Comment
To view or add a comment, sign in
Ishwaragouda Patil

Software Engineer @ IZT
8mo Edited
Report this post
Exploring the Versatile Applications of LLMs Beyond Basic Writing Tasks 🚀 Many perceive language models like ChatGPT primarily as tools for crafting text, such as letters, poems, or code. However, their potential extends far beyond these basic applications. LLMs, equipped with their deep understanding of natural language, can serve as powerful computational aids in diverse fields including engineering 🛠 and economics 📊. These models can interpret complex concepts conveyed through simple language prompts, seamlessly integrating with external computational tools to provide comprehensive solutions 🧠 . To demonstrate this broader perspective of LLMs, I have prepared a video lecture of approximately 30 minutes 🎥. This lecture delves into an engineering computation task, illustrating how ChatGPT's multimodal capabilities can be leveraged. Specifically, it showcases the model's ability to process a handwritten image submitted by a user, understand the underlying problem, and offer solutions ✍️. Discover more about this fascinating interplay of language understanding and computational power in my video lecture. Watch it here: https://lnkd.in/giuD-_qn 👀

LEVERAGING LLMs/ChatGPT FOR COMPUTATIONAL TASKS

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
AbdulRahman N.
5mo Edited
Report this post
In my previous post, I shared an overview of our graduation project—an AI-powered PDF application called ChatPDF 📃. Now, let’s zoom in on the first key component that’s set to transform how we interact with PDFs: Chat with your PDF: Imagine a world where you can converse with your PDF files 📁—where natural language seamlessly interacts with the static text on the page. With ChatPDF, this vision becomes a reality. Our feature enables dynamic, human-like interactions with PDF content, bridging the gap between data and dialogue. How It Works: At the core of the Chat with PDF feature, we combine LLM and RAG (Retrieval-Augmented Generation). In this post, I will focus exclusively on the fine-tuning process. For this step, Belal Al-Tabbaa and I fine-tuned two LLMs: LLama2-7B and Mistral-7B, on the SQuAD2.0 dataset to enhance their ability to answer questions within a contextual framework. Fine-Tuning Process: Leveraging the PEFT/LORA approach, we fine-tuned LLama2-7B and Mistral-7B. • PEFT (Parameter-efficient Fine-tuning) streamlines the fine-tuning of language models by selectively updating only a subset of parameters, balancing efficiency and accuracy. • LoRA (Low-rank Adaptation) further reduces model complexity by creating low-rank transformations of existing weights. Together, they enhance adaptability while managing computational demands. Additionally, we optimised memory usage through 4-bit quantisation, allowing efficient fine-tuning on a single GPU. After fine-tuning, we seamlessly merged the adapter with the base model. These models are available on HuggingFace for exploration. https://lnkd.in/dehC7y-y In an upcoming post, I’ll dive into the Retrieval-Augmented Generation (RAG) technique. Stay tuned for more insights! 🌟 If you’d like to learn more about my project, feel free to send me a message. I’m always open to connecting and discussing exciting ideas! 🚀 #LLM #FineTuning #HugginFace #LLama #Mistral #PEFT #LoRA #RAG

2 Comments
Like Comment
To view or add a comment, sign in
Clarifai

72,983 followers
2mo Edited
Report this post
New release - Clarifai 10.5 is out now! ⚡️ We are excited to introduce Fine-Tuning Large Language Models using the Clarifai Platform. Fine-tuning allows you to adapt foundational text-to-text models to specific tasks or domains, making them more suitable for particular applications. You can now select a pre-configured model template from the platform to train on your data, such as Llama2 7/13B or Mistral models with GPTQ-Lora. In addition to LLM fine-tuning, the release also includes the new Coding App Template, LiteLLM Integration, New models (GPT-4o, Gemini 1.5 Flash, Snowflake Arctic-Instruct), and other feature improvements, along with bug fixes. Read more in the release blog here: https://lnkd.in/dz6t4Jix #llmfinetuning
Like Comment
To view or add a comment, sign in
Gian Maria Ricci

Microsoft MVP - Developer Technologies Azure DevOps
8mo
Report this post
I've published a new video about Semantic Kernel, this time I'll explain how to use HTTP interceptor to analyze how SK uses Chat GPT to solve user question. Such functionality is really good when you do not have the answer you want and you need to understand what is the path choosen by Chat GPT to solve the problem. #semantickernel #chatgpt https://lnkd.in/d64Csx-x

Intro to Semantic Kernel - How to debug interaction with GPT

https://www.youtube.com/

7 Comments
Like Comment
To view or add a comment, sign in
Michael Carey
1mo
Report this post
New site documenting key technical resources with examples for using Knowledge Graphs in your GenAI application. Check it out! https://lnkd.in/e5WjAgb5

GenAI Ecosystem - Neo4j Labs

neo4j.com
Like Comment
To view or add a comment, sign in
Aishwarya Madhavan

Bringing The World's AI to developers everywhere !! Hiring Enterprise Pre-sales Sales/Solutions Engineer (India, Remote)!! DM at [email protected]
2mo
Report this post
New Check out! Clarifai #clarifai #artificialintelligence #LLM #generativeai
Clarifai

72,983 followers
2mo Edited

New release - Clarifai 10.5 is out now! ⚡️ We are excited to introduce Fine-Tuning Large Language Models using the Clarifai Platform. Fine-tuning allows you to adapt foundational text-to-text models to specific tasks or domains, making them more suitable for particular applications. You can now select a pre-configured model template from the platform to train on your data, such as Llama2 7/13B or Mistral models with GPTQ-Lora. In addition to LLM fine-tuning, the release also includes the new Coding App Template, LiteLLM Integration, New models (GPT-4o, Gemini 1.5 Flash, Snowflake Arctic-Instruct), and other feature improvements, along with bug fixes. Read more in the release blog here: https://lnkd.in/dz6t4Jix #llmfinetuning
Like Comment
To view or add a comment, sign in
Matthew Wemyss

Assistant School Director | Co-Chair of COBIS ConnectED for AI: Digital Innovation | Co-host of Ctrl+Alt+Teach Podcast | Edufuturists Awards - A.I. Pioneer 2024
9mo
Report this post
CYOAGPT - A Multi-model Adventure I've updated my original Choose Your Own Adventure prompt as a GPT. As you work your way through the adventure images are generated for key scenes. I tried to get it to work for the initial scene, but after two days and hours of conversations with the GPT Builder, I gave up. After the first choice the images will kick in 😁 #genai #AIDUCATION https://lnkd.in/dCs2w-q2

ChatGPT - CYOAGPT

chat.openai.com

10 Comments
Like Comment
To view or add a comment, sign in
Rohan Rajendra

Global Network Operations Engineering team lead.
9mo
Report this post
Google just introduced Gemini. Gemini is a multimodal Al model capable of reasoning across text, images, audio, video and code. Are we close to achieving AGI? Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of Al models. Gemini 90.0%, Human expert 89.8%! The model is also advanced in math and coding and has an incredible understanding of science. Gemini Pro will be available for free in Bard and across Google apps. Follow me Rohan Rajendra to stay up to date with Al. #ai #artificialintelligence #machinelearning #ml #google #googleai #gemini #visualization #innovation
Like Comment
To view or add a comment, sign in
Mrunal Malekar

MS in Data Science at Northeastern University | Ex-Data Scientist Co-op Massachusetts General Hospital | Expected Graduation Dec 2024
1mo
Report this post
🎓Recently as part of my CS 6620 : Natural Language Processing assignment I got a chance to complete the "Getting Started with Mistral" course offered by DeepLearning.AI. This course has been an amazing experience exploring Mistral AI's powerful models and tools. What I Learned: 🌟 Model Access and Selection: I explored Mistral AI’s diverse collection of open-source and commercial models, including the advanced Mixtral 8x7B and Mixtral 8x22B models. I learned how to select the right model based on task complexity and speed requirements. 🌟 Effective Prompting Techniques: The course emphasized the importance of effective prompting techniques, ensuring that I can interact with models efficiently to get the best results. 🌟 Function Calling: One of the standout features I mastered is Mistral’s native function calling. This allows large language models to call tools as needed to perform tasks better suited to traditional code, such as querying a database for numerical data. 🌟 Retrieval Augmented Generation (RAG): I built a basic RAG system from scratch, involving proper data chunking, creating embeddings, and implementing similarity search. 🌟 Interactive Chat Systems: I also learnt to build a chat interface to interact with Mistral models, enabling me to upload documents and ask questions about them. This course has equipped me with the skills to leverage Mistral AI’s state-of-the-art models and tools, enabling me to tackle a wide range of tasks more effectively. #DeepLearning #AI #MachineLearning #MistralAI #ContinuousLearning #ProfessionalDevelopment #DataScience #AIModels

Getting Started with Mistral

coursera.org
Like Comment
To view or add a comment, sign in

60,369 followers

View Profile Follow

Skit.ai’s Post

More from this author

DEBTox August 2024: What the Chevron Decision Means

DEBTox July 2024: Chatbots Are the Future

DEBTox June 2024: Understanding the Supreme Court’s CFPB Decision

Explore topics

Skit.ai’s Post

More Relevant Posts

LEVERAGING LLMs/ChatGPT FOR COMPUTATIONAL TASKS

https://www.youtube.com/

Intro to Semantic Kernel - How to debug interaction with GPT

https://www.youtube.com/

More from this author

DEBTox August 2024: What the Chevron Decision Means

DEBTox July 2024: Chatbots Are the Future

DEBTox June 2024: Understanding the Supreme Court’s CFPB Decision

Explore topics