面向AI應用開發實戰分享 - 基礎篇-人工智慧-CodeUp Hub

引言

如果你是一名前端開發，同時又對AI開發很感興趣，那麼恭喜你，機會來了。

如果不是也沒關係，同樣能幫大家瞭解AI應用的開發思路。

本文將帶大家從面向AI開發的基礎知識開始，再到RAG，Agent，流程編排，深入瞭解如何在企業內部落地AI專案。

基礎篇

一、如何面向AI互動

通常，我們使用一段文字輸入，AI模型都會基於大模型自身來進行回答，這個相信大家已經都非常瞭解。但是，如果想讓AI能夠基於我們所期待的內容回答，或者說是基於我們的私域資訊來進行回答，我們有哪些辦法？

模型訓練
微調Fine-tuning
Prompt提示詞工程
RAG檢索增強生成

模型訓練：

透過從huggingface下載開源模型，在本地完成部署，比如最新推出的Llama 3 8B版，小模型對GPU的要求會相對低些，後透過大量的文件資料完成模型訓練。

雖說小型模型降低了GPU的算力資源但成本也不是普通企業能承擔的，除了自身的硬體成本、模型最佳化的人力成本，也存在模型的汰換風險，一旦外部大廠出個大招，那我們訓練的模型就會面臨淘汰，但企業也應採取防禦型戰略，先擁抱，畢竟AI已是大勢所趨，模型在應用層介面方面在開源社羣裡已經標準化，開發設計時模型與功能解耦，隨時替換。

微調Fine-tuning：

很多商業AI的服務模型都提供了這一能力，允許使用者針對特定的應用場景調整預訓練好的模型，以獲得更符合預期的輸出結果。

比如，你的公司有一個內部專案代號為"Project”，您希望使用LLM模型來自動生成關於"Project"的文件或回答員工關於"Project"的查詢。但預訓練模型沒有接觸過"Project"這個術語，因此無法生成相關的準確資訊。這時候就可以透過一些術語或上下文來調整模型對於這一塊的理解。

最後，微調是一種付費服務，如果未來換其他模型，你需要重新進行微調以適應新模型的特性和改進。這將再次產生計算和時間成本。

Prompt提示詞工程：

這個應該是剛接觸AI開發的同學，最先使用的，讓AI能夠按照我們的期望完成指令交付的方式。比如，讓模型儘量用中文回答。你需要準備一份包含角色、背景、技巧、輸出風格、輸出範圍等的Prompt提示詞，然後在每次通訊時攜帶在上下文裡。

如果你使用chat_model（Langchain術語）方式，則會在message陣列的0鍵位一直保持system prompt，如果是LLM（Langchain術語）方式，則是在每次通訊時的message字串裡包裝prompt+question，這裏我們更應該基於chat_model方案來開發。

但是當你想要正式的投入到自己的專案中時，你可能會發覺Prompt非常難最佳化，AI並不能完全按照你的要求去執行。總結，Prompt會有以下幾個痛點：

設計難度大，如果模型的輸出依賴於我們的提示詞反饋，這可能會形成一個迴圈，我們需要不斷地調整提示詞以獲得更好的輸出。
長度限制，每次通訊的message通常會包含：Prompt + n輪上下文history + 本次的question，這些內容的總文字數也是計算我們單次會話的token總成本，過長的prompt很容易使AI產生幻覺，影響回覆結果。
Prompt依然無法解決讓模型面向私域，我們公司內部的知識庫進行回答

RAG檢索增強生成：

RAG對剛接觸的同學可能會比較抽象，借用Langchain的圖來介紹一下

首先是embedding向量儲存

我們把內部文件在提取內容後進行切片，將內容轉為段落陣列（chunk），然後傳入大模型的embed介面，模型會返回浮點數字，這個過程就是embedding，最後我們會把浮點數存入向量庫，常見的向量庫有es、faiss

面向AI應用開發實戰分享 - 基礎篇

接著是內容召回

輸入一個問題，先透過模型embedding把問題轉為向量資料，然後在我們的文件庫裡進行相似度搜索，召回相似度接近的資料後再交由大模型進行總結，最後返回給使用者

面向AI應用開發實戰分享 - 基礎篇

以上就是RAG的整個過程，RAG是個非常考驗技術的工作，以上的流程是無法描述出RAG複雜性的，包括我們的產品在上線後，至今還在不斷嘗試如何更好的提升RAG的質量，做到能用很簡單，但要做好非常難。

後面講到內部知識庫時再來討論目前我們的方案，和線上實際效果。

引用在其他文章裡看到的一句話，感同身受

RAG涉及的內容其實廣泛，包括Embedding、分詞分塊、檢索召回（相似度匹配）、chat系統、ReAct和Prompt最佳化等，最後還有與LLM的互動，整個過程技術複雜度很高。如果你用的LLM非常好，反而大模型這一塊是你最不需要關心的。而這些環節裡面我們每個都沒達到1（比如0.9、0.7...），那麼最終的結果可能是這些小數點的乘積。

二、Agent

前面主要是AI在文字內容上的交付，那如何讓AI完成工作的交付呢？

當在工作彙報時，如果能用下面這張圖來演示你的AI Agent功能，會不會很有吸引力？

面向AI應用開發實戰分享 - 基礎篇

（取自QCon上的一張分享圖）

目前想實現Agent，主要有以下2種方式

ReAct自我推理

Few-shot Prompt + Thought + Action + Observation

透過構造一個內含工具、推理和規劃的prompt結構，模型在內部透過與提示的互動進行自我迭代和調整，以選擇適當的工具或生成更好的輸出。

例如：

{    
    "messages": [  
        {     
            "role": "system",   
            "content": "Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS."     
         },     
         {         
             "role": "user",        
             "content": "TOOLS\n------\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\n\ninfo-tool: Useful for situations where you need to retrieve content through one or more URLs from https://info.bilibili.co/. Input should be a comma-separated list in the format of "one or more valid URLs with the domain https://info.bilibili.co/pages/viewpage.action, where the URL should include the pageId parameter", followed by "the information you need to summarize, or to obtain a summary".\n\nRESPONSE FORMAT INSTRUCTIONS\n----------------------------\n\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\n\n**Option 1:**\nUse this if you want the human to use a tool.\nMarkdown code snippet formatted in the following schema:\n\n```json\n{\n    "action": string, // The action to take. Must be one of [info-tool]\n    "action_input": string // The input to the action. May be a stringified object.\n}\n```\n\n**Option #2:**\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\n\n```json\n{\n    "action": "Final Answer",\n    "action_input": string // You should put what you want to return to use here and make sure to use valid json newline characters.\n}\n```\n\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with "```json" and end with "```")!\n\n\nUSER'S INPUT\n--------------------\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\n\nhttps://info.bilibili.co/pages/viewpage.action?pageId=849684529\n這篇文章講了什麼"     
         }   
     ]
}

我們透過Prompt告訴模型，它善於使用工具來解決問題，告訴它每一個工具的介紹，和需要填入什麼引數，最後要求模型每次回覆時必須遵循使用markdown code格式返回，然後我們會在Agent程序裡消費返回的json-schema，是呼叫工具還是Final Answer

Tool-call 代理互動

很明顯ReAct會導致我們的上下文過長，很容易造成模型在經過幾輪迭代之後不已markdown code的格式來返回內容，最終導致Agent走不下去。

tool-call的出現解決了這一問題，我們會把Prompt裡這些非結構化的工具描述轉化為結構化的api欄位，這樣既節省了Prompt的上下文長度，也變的容易控制

例如：

// POST /chat/completions
{
  ...  
  "tools": [ 
      {  
        "type": "function",    
        "function": {   
          "name": "info-tool",  
          "description": "開啟一個或多個帶有pageId的xxxx網站，完成使用者需求", 
          "parameters": {      
            "type": "object",    
            "properties": {         
              "pageId": {      
                "type": "number",      
                "description": "請填寫網址裡的pageId,多個用逗號隔開"    
              },       
              "task": {   
                "type": "string",    
                "description": "描述需求"   
              }        
            },    
            "required": [   
              "pageId",          
              "task"          
            ],     
            "additionalProperties": false,      
            "$schema": "http://json-schema.org/draft-07/schema#"       
          }     
        }   
      },    
      ...更多其他工具 
  ], 
  ...
}

此時，模型也會以結構化的方式告訴你他使用的工具

// API Response
{   
    ... 
    "tool_calls": [  
        {    
            "index": 0,    
            "id": "info-tool:0",  
            "type": "function",        
            "function": {         
            "name": "info-tool",      
            "arguments": "{\n    "task": "獲取頁面內容",\n    "pageId": 845030990\n}"  
            }     
        }  
    ]   
    ...
}

三、開發框架

再來介紹下我們選擇的技術框架，之後也會介紹其優點和不足之處

Langchain

在許多討論AI的文章裡都會提到Langchain，或者很多的開源框架都在和Langchain作比較。Langchain是一個整合了商業和開源模型，並提供了一整套工具和功能，簡化了開發、整合和部署基於語言模型的應用。

元件化：為使用語言模型提供抽象層，以及每個抽象層的一組實現。元件是模組化且易於使用的，無論是否使用LangChain框架的其餘部分。
現成的鏈：結構化的元件集合，用於完成特定的高階任務

通俗的講，它為不同的模型，不同的元件提供了統一的輸入和輸出規範。

在Chain裡可以傳入[Prompt、Model、Tool、Memory（歷史會話）、OutputParser]，也能將多個model進行巢狀，讓上一個model的輸出作為下一個PromptTemplate的輸入

目前官方提供了2種語言的版本，一個是Python，另一個是Nodejs

Flowise

基於Langchain的AI流程編排系統，主語言Nodejs，為Langchain的每個模型類和元件類提供了視覺化的低程式碼元件，透過在畫布上的拖拽元件，即可完成AI的整套交付流程，元件包括Chain（程序）、Prompt、Agent Tool、Chat Module等。

同類的還有Dify，它提供了多模型對接、RAG、任務編排、等整套的產品化方案。

Flowise更像是一個毛坯房，提供瞭解決方案，但所有的產品化還是需要自己開發，讀懂它，能讓你在開發Langchain時事半功倍。Dify更像豪華大別墅，大多數的功能都已經做好了產品化，內部獨立維護了與模型的api封裝，主語言Python。

Flowise中的packages介紹：

Server：express，CRUD、完成元件庫內的例項執行
Component：JavaScript，實現Langchain類的視覺化和低程式碼
UI：React，AI流程編排的畫布，和一些維護頁面

以下是一個透過Agent由AI判斷選擇使用哪些工具的編排展示，我們重新開發了Agent元件，已更適應我們的tool-call功能，在Bili Agent主程序中，元件會負責消費這些關聯了的工具

面向AI應用開發實戰分享 - 基礎篇

部分程式碼示例

import { AgentExecutor } from 'langchain/agents' 

// 將工具的配置資訊轉為model介面裡tools的結構化欄位
// 由於對齊了接口規範,所以可以直接使用formatToOpenAITool函式
const modelWithTools = model.bind({
    tools: [...tools.map((tool: any) => formatToOpenAITool(tool))]
})

// 按順序組合
const runnableAgent = RunnableSequence.from([ 
    // 包含了使用者的指令,和將模型訊息裡的tool_calls format後得到的ToolMessage,和上下文聊天記錄 
    // 以上這些都會輸入給prompt   
    {     
       [inputKey]: (i: { input: string; steps: AgentStep[] }) => i.input, 
       agent_scratchpad: (i: { input: string; steps: ToolsAgentStep[] }) => formatToolAgentSteps(i.steps),       
       [memoryKey]: async (_: { input: string; steps: AgentStep[] }) => {  
           const messages = (await memory.getChatMessages(flowObj?.sessionId, true, chatHistory)) as BaseMessage[]         
           return messages ?? []      
        }    
    },  
    prompt, 
    modelWithTools,  
    new OpenAIToolsAgentOutputParser()
 ])
 
 const executor = AgentExecutor.fromAgentAndTools({ 
     agent: runnableAgent,   
     tools,   
     returnIntermediateSteps: true, 
     maxIterations: 5
 })
 
 executor.invoke({input: '明天是幾月幾號?'})
 
 // tool_calls示例
 { 
    "tool_calls": [   
      {   
        "index": 0,    
        "id": "GetDate:0",   
        "type": "function",   
        "function": {     
          "name": "GetDate",   
          "arguments": "{\n    "task": "獲取明天的日期"\n}"      
        }   
      }   
   ]
}

最後透過Agent的配置，就可以讓模型在通用域和私域或是工具外掛裡自由的選擇進行聊天

面向AI應用開發實戰分享 - 基礎篇