Ana Sulakvelidze’s Post

View profile for Ana Sulakvelidze, graphic

MBAi Candidate at Kellogg School of Management

Great read

View profile for Andrew Ng, graphic
Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Founder and CEO of Landing AI

Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

Autonomous Coding Agents, Instability at Stability AI, Mamba Mania, and more

deeplearning.ai

To view or add a comment, sign in

Explore topics