Ana Sulakvelidze’s Post

Ana Sulakvelidze

MBAi Candidate at Kellogg School of Management

3mo

Great read

Andrew Ng

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

To view or add a comment, sign in

More Relevant Posts

Adesh Gairola

Security at Amazon Web Services (AWS)
3mo
Report this post
especially for use cases where accuracy trumps speed!

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in
Bruce Wang

Scrum Master at AvePoint
3mo
Report this post
Planning is the core for AI Agent.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in
Gopinath Mavankal

Consultant, Global Supply Chain, SAP, cross-system visibility of inventory, 3PL integration, Analytics, Data Science and Management
3mo
Report this post
Links to learn are useful

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in
AIgentic

33 followers
3mo
Report this post
𝗘𝘅𝗽𝗹𝗼𝗿𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗼𝗹𝗲 𝗼𝗳 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 𝗶𝗻 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 𝘄𝗶𝘁𝗵 𝗔𝗜 Notes from Andrew Ng Planning is a crucial design pattern in agentic AI, where large language models (LLMs) autonomously determine the sequence of actions needed to achieve a broader objective. This involves breaking down tasks into manageable subtasks, such as conducting online research on specific subtopics, synthesizing the findings, and compiling a comprehensive report. 𝗥𝗲𝗮𝗹-𝗪𝗼𝗿𝗹𝗱 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 𝗔𝗻 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗠𝗼𝗺𝗲𝗻𝘁 Andrew Ng shares a personal "AI Agentic moment" that highlights the adaptive capabilities of these systems. During a live demo, his research agent faced an unexpected error from a web search tool. Rather than failing, the agent seamlessly switched to a Wikipedia search tool, successfully completing the task. This ability to adapt and find alternative solutions without human intervention is a testament to the sophisticated planning capabilities of AI agents. 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗧𝗮𝘀𝗸𝘀 𝗮𝗻𝗱 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗠𝗮𝗸𝗶𝗻𝗴 In complex scenarios, tasks often cannot be accomplished with a single tool or in one step. Here, an agent must dynamically decide on the sequence of actions. For instance, to transform the image of a boy into a girl in the same pose might involve detecting the pose first and then rendering a new image based on that pose. The case illustrated by Andrew Ng showcases how an LLM can be prompted to plan by specifying sequential steps that are executed by different tools. 𝗧𝗵𝗲 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗼𝗳 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 Despite its powerful capabilities, Planning in AI can lead to unpredictable outcomes. It's a less mature aspect of agentic AI compared to other patterns like Reflection and Tool Use, which tend to provide more reliable improvements in applications. The unpredictability of Planning makes it challenging to anticipate the actions of AI agents fully. 𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗥𝗲𝗮𝗱𝗶𝗻𝗴 𝗮𝗻𝗱 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 For those interested in delving deeper into the intricacies of Planning with LLMs, recommended readings include: 🔹 “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” by Wei et al. (2022). 🔹“HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face” by Shen et al. (2023). 🔹“Understanding the planning of LLM agents: A survey” by Huang et al. (2024). __________ 🌟 Elevate Your AI Insight with AIgentic 🌟 Join AIgentic for the latest in AI agentic systems and multi-agents. 🚀 Follow Us! Be at the forefront of tech innovation. AIgentic is where pioneers gather to shape the future with Agentic systems. 👥 Spread the Word! Help us reach more innovators like you. Share our journey, and let's unlock AI's full potential together! Click follow for groundbreaking AI discoveries. Together, we're stronger! #AI #ArtificialIntelligence #AIgentic #LLM

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in
Alex Holub
3mo
Report this post
Andrew Ng with some thoughts on Planning for GenAI Agents (along with some good paper references). Planning could play an interesting role in helping complete jobs to be done in engineering ops and coding. #ai #agents #genai

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

1 Comment
Like Comment
To view or add a comment, sign in
Tom Swanson
3mo
Report this post
Fascinated by the capabilities of agentic AI as discussed here! The example of an AI adapting during a live demo to overcome unexpected challenges showcases its potential for business applications, particularly in automating and enhancing decision-making processes. Excited to see how these advancements will transform operational efficiencies in various industries.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

3 Comments
Like Comment
To view or add a comment, sign in
Sanghamitra Deb

AI & ML Leadership
3mo Edited
Report this post
An agentic design to solve complex problems is the next thing in Generative AI, creating a plan for different situations and having the LLM execute the plan in case of failures helps build a full proof system. #nextinai These systems are in integral part of designing an efficient #conversationalai systems. #llm, #ai, #generativeAI

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in
Libny Pacheco

Computational Design Lead @ White Arkitekter | Fostering Culture of Knowledge Sharing
3mo
Report this post
Agents and LLMs? This is getting interesting. I am thinking of Michael Levin's discoveries on distributed memory patterns and how genes are not responsible entirely for where and when body parts are developed. It is easy to see parallels to design. The question remains: How are we going to codify the 3D geometric and 1D topologic data that make up our buildings? #graphs, maybe? #ai #designingai #computationaldesign #aidesign

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

3 Comments
Like Comment
To view or add a comment, sign in
Sai Vikshit Kode

MSCS Graduate '25 @SeattleU | Actively looking for Internships | Software Engineer | ASP.NET | Python | Amazon Web Services (AWS) | React.js | Machine Learning | Artificial Intelligence | Agile | Ex-Cognizant
3mo
Report this post
Fascinating read! The concept of "Planning" in AI design as discussed here shows how advanced AI systems, like large language models, autonomously sequence actions to tackle the complex tasks. The story about the AI agent unexpectedly switching from a web search to Wikipedia during a live demo when faced with an API error is a striking example of AI's adaptability and potential for unexpected problem-solving. This adaptability is exactly what makes AI so powerful and unpredictable. It's intriguing to see AI perform beyond set expectations, navigating challenges in real-time, much like a human would. The notion that these systems can dynamically adjust their strategies rather than follow a strict, pre-determined path is a leap towards more robust and intelligent systems. For anyone interested in how AI can creatively overcome obstacles and the intricacies of AI planning, the recommended readings look incredibly promising. They delve deeper into how AI models like ChatGPT can be fine-tuned or prompted to plan and execute complex tasks. This marks a significant advancement in how we interact with and leverage AI technologies. Can't wait to see where this leads! 😀 #AIPlanning #MachineLearning #ArtificialIntelligence #TechInnovation

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

1 Comment
Like Comment
To view or add a comment, sign in
Kurt Cagle Kurt Cagle is an Influencer

Editor In Chief @ The Cagle Report | AI, Data Modeling
3mo
Report this post
This is important as a design strategy when looking at agents. An agenting system, is, in essence, a workflow with branch points and conditional operations. We worked with something very similar in the XML space around fifteen years ago with XProc, which used an XML-based language to run transformations, queries, and validators. Such workflows can get to be quite complex. There have similarly been workflows within the RDF space, and you can build SHACL based workflows of some complexity on the semantic side (I've seen some based on prov-o as well).

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI
3mo

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai
Like Comment
To view or add a comment, sign in

624 followers

15 Posts

View Profile Follow

Ana Sulakvelidze’s Post

More Relevant Posts

Explore topics