LangChain Agent 原理解析

LangChain 是一个基于 LLM（大型语言模型）的编程框架，旨在帮助开发人员使用 LLM 构建端到端的应用程序。它提供了一套工具、组件和接口，可以简化创建由 LLM 和聊天模型提供支持的应用程序的过程。LangChain 由几大组件构成，包括 Models，Prompts，Chains，Memory 和 Agent 等，而 Agent 是其中重要的组成部分，如果把 LLM 比做大脑的话，那 Agent 就是给大脑加上手和脚。今天就来带大家重点了解一下 Agent 以及它的工作原理。

什么是 LangChain Agent

在 LangChain 中，Agent 是一个代理，接收用户的输入，采取相应的行动然后返回行动的结果。Agent 可以看作是一个自带路由消费 Chains 的代理，基于 MRKL 和 ReAct 的基本原理，Agent 可以使用工具和自然语言处理问题。官方也提供了对应的 Agent，包括 OpenAI Functions Agent、Plan-and-execute Agent、Self Ask With Search 类 AutoGPT 的 Agent 等。Agent 的作用是代表用户或其他系统完成任务，例如数据收集、数据处理、决策支持等。Agent 可以是自主的，具备一定程度的智能和自适应性，以便在不同的情境中执行任务。我们今天主要了解基于 ReAct 原理来实现的 Agent。

ReAct

ReAct 是一个结合了推理和行动的语言模型。虽然 LLM 在语言理解和交互决策制定方面展现出了令人印象深刻的能力，但它们的推理（例如链式思考提示）和行动（例如行动计划生成）的能力主要被视为两个独立的主题。ReAct 的目标是探索如何使用 LLM 以交错的方式生成推理痕迹和特定任务的行动，从而在两者之间实现更大的协同作用。

想象一下，你有一个智能助手机器人，名叫小明。你给小明一个任务：去厨房为你做一杯咖啡。小明不仅要完成这个任务，还要告诉你他是如何一步步完成的。

没有 ReAct 的小明：

小明直接跑到厨房。
你听到了一些声音，但不知道小明在做什么。
过了一会儿，小明回来给你一杯咖啡。

这样的问题是，你不知道小明是怎么做咖啡的，他是否加了糖或奶，或者他是否在过程中遇到了任何问题。

有 ReAct 的小明：

小明告诉你：“我现在去厨房。”
小明再说：“我找到了咖啡粉和咖啡机。”
“我现在开始煮咖啡。”
“咖啡煮好了，我要加点糖和奶。”
“好了，咖啡做好了，我现在给你拿过去。”

这次，你完全知道小明是怎么做咖啡的，知道他的每一个步骤和决策。

ReAct 就是这样的原理。它不仅执行任务（行动），还会告诉你它是如何思考和决策的（推理）。这样，你不仅知道任务完成了，还知道为什么这样做，如果有问题，也更容易找出原因。

更多关于 ReAct 的内容可以查看这篇文章。

自定义 LLM Agent

LangChain 在官方网站上提供了关于如何创建自定义 LLM Agent的例子，在官网的示例中，我们除了看到自定义 LLM Agent 外，还有一个自定义 Agent，这两者的区别就是自定义 LLM Agent 使用了 LLM 来解析用户输入，判断使用何种工具，而自定义 Agent 则是直接自行判断工具的使用，这种方式只能用于简单的场景，而自定义 LLM Agent 可以用于更复杂的场景。

提示词模板

要实现 Agent，我们需要先定义一套基于 ReAct 的提示词模板，示例中的 Agent 就是基于 ReAct 原理来实现的，修改后的提示词模板内容如下：

template = """尽你所能回答以下问题，你可以使用以下工具：

{tools}

请按照以下格式：

问题：你必须回答的输入问题
思考：你应该始终考虑该怎么做
行动：要采取的行动，应该是[{tool_names}]中的一个
行动输入：行动的输入
观察：行动的结果
... (这个思考/行动/行动输入/观察可以重复N次)
思考：我现在知道最终答案了
最终答案：对原始输入问题的最终答案

开始吧！

问题：{input}
{agent_scratchpad}"""

构造提示词

准备好提示词模板后，我们就可以构造提示词了，构造提示词的官方示例代码如下：

# Set up a prompt template
class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]

    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(** kwargs)

prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
)

工具解析

接下来是输出结果的解析，其中分为工具解析和结果解析两部分：

class CustomOutputParser(AgentOutputParser):

    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if Agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )

        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise OutputParserException(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

中断提示

最后一步是创建 Agent，示例代码如下：

llm = OpenAI(temperature=0)
# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    stop=["\nObservation:"],
    allowed_tools=tool_names
)

总结

自定义 LLM Agent 的示例代码我们已经介绍完了，最后我们再讲下来 Agent 中使用的 LLM。在官方示例中，LLM 用的是 OpenAI，也就是gpt-3.5这个模型，但如果想达到更好的效果的话，推荐使用 OpenAI 的gpt-4模型，它是目前最好的 LLM，如果使用的 LLM 比较差，就容易出现刚才我们提到 LLM 返回的结果不符合我们预期的情况。

有人希望通过一些开源的 LLM 来实现 ReAct Agent，但实际开发过程中会发现开源低参数（比如一些 6B、7B 的 LLM）的 LLM 对于提示词的理解会非常差，根本不会按照提示词模板的格式来输出，这样就会导致我们的 Agent 无法正常工作，所以如果想要实现一个好的 Agent，还是需要使用好的 LLM，目前看来使用gpt-3.5模型是最低要求。