Great read
Planning is a key agentic AI design pattern in which we use a large language model (LLM) to autonomously decide on what sequence of steps to execute to accomplish a larger task. For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report. Many people had a “ChatGPT moment” shortly after ChatGPT was released, when they were surprised that it significantly exceeded their expectation of what AI can do. If you have not yet had a similar “AI Agentic moment,” I hope you will soon! I had one several months ago, when I presented a live demo of a research agent I had implemented that had access to various online search tools. I had tested this agent multiple times privately, during which it consistently used a web search tool to gather information and wrote up a summary. During the live demo, the web search API unexpectedly returned with a rate limiting error. I thought my demo was about to fail publicly. To my surprise, the agent pivoted deftly to a Wikipedia search tool — which I had forgotten I’d given it — and completed the task using Wikipedia instead of web search. This was an AI Agentic moment of surprise for me. It’s a beautiful thing when you see an agent autonomously do things in ways that you had not anticipated, and succeed as a result! Many tasks can’t be done in a single step. For example, to simplify an example from the HuggingGPT paper (cited below), if you want an agent to examine a boy's picture and draw a picture of a girl in the same pose, the task might be decomposed into two steps: (i) detect the boy's pose and (ii) render a picture of a girl in the detected pose. An LLM might be fine-tuned or prompted (with few-shot prompting) to specify a plan by outputting a string like "{tool: pose-detection, input: image.jpg, output: temp1 } {tool: pose-to-image, input: temp1, output: final.jpg}". This structured output triggers software to invoke a pose detection tool followed by a pose-to-image tool to complete the task. (This example is for illustrative purposes only; HuggingGPT uses a different format.) Admittedly, many agentic workflows do not need planning. For example, you might have an agent reflect on, and improve, its output a fixed number of times, resulting in a set of fixed, deterministic steps. But for complex tasks in which you can't specify a task decomposition ahead of time, Planning allows the agent to decide dynamically what steps to take. To learn more, I recommend: - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022) - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023) - Understanding the planning of LLM agents: A survey, by Huang et al. (2024) [Original text: https://lnkd.in/gM2ZWNsW ]