Node.js 實現的灰度釋出系統產生 OOM 問題排查過程-異常彙總-CodeUp Hub

針對 Node.js 服務實現一個灰度釋出系統，並選擇了基於程序的方式。用程式碼來簡單表示的話，就像這樣：

// index.js
const cp = require('child_process')
const url = require('url')
const http = require('http')

const child1 = cp.fork('child.js', [], {
  env: {PORT: 3000},
})
const child2 = cp.fork('child.js', [], {
  env: {PORT: 3001},
})

function afterChildrenReady() {
  let readyN = 0
  let _resolve

  const p = new Promise((resolve) => {
    _resolve = resolve
  })

  const onReady = (msg) => {
    if (msg === 'ready') {
      if (++readyN === 2) {
        _resolve()
      }
    }
  }

  child1.on('message', onReady)
  child2.on('message', onReady)

  return p
}

const httpServer = http.createServer(function (req, res) {
  const query = url.parse(req.url, true).query

  if (query.version === 'v1') {
    http.get('http://localhost:3000', (proxyRes) => {
      proxyRes.pipe(res)
    })
  } else {
    http.get('http://localhost:3001', (proxyRes) => {
      proxyRes.pipe(res)
    })
  }
})

afterChildrenReady().then(() => {
  httpServer.listen(8000, () => console.log('Start http server on 8000'))
})

// child.js
const http = require('http')

const httpServer = http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'})
  setTimeout(() => {
    res.end('handled by child, pid is ' + process.pid + '\n')
  }, 1000)
})

httpServer.listen(process.env.PORT, () => {
  process.send && process.send('ready')
  console.log(`Start http server on ${process.env.PORT}`)
})

簡單解釋下上面程式碼，執行 index.js 時，會 fork 出兩個子程序，主程序根據請求引數來決定代理到哪個子程序，從而實現不同使用者看到不同的內容。

不過，由於多了一層代理，服務的效能肯定會受到影響。爲了最佳化，可以考慮複用 TCP 連結，即在呼叫 http.request 的時候使用 agent。不過，就是因為這個，導致服務出了問題。

我們來模擬一下，首先，修改一下上面的程式碼，啟動 TCP 連結複用，並規定只開啟一條連結：

const agent = http.Agent({keepAlive: true, maxSockets: 1})

const httpServer = http.createServer(function (req, res) {
  const query = url.parse(req.url, true).query

  if (query.version === 'v1') {
    http.get('http://localhost:3000', {agent}, (proxyRes) => {
      proxyRes.pipe(res)
    })
  } else {
    http.get('http://localhost:3001', {agent}, (proxyRes) => {
      proxyRes.pipe(res)
    })
  }
})

然後，我們使用 autocannon -c 400 -d 100 http://localhost:8000 來進行壓測。

測試結果發現：

壓測過程中，訪問 http://localhost:8000 超時
壓測過程中，記憶體佔用快速增長
壓測結束後，訪問 http://localhost:8000 仍然超時，記憶體佔用緩慢下降，過了很久以後訪問纔會有響應

我們可以把 TCP 連結比喻成一條鐵路，一個 HTTP 的內容則會被分成若干個車廂在這條鐵路上運輸：

Node.js 實現的灰度釋出系統產生 OOM 問題排查過程

由於 Proxy 與 Server 之間只有一條路，當 Client 來的請求太快時，需要排隊等待處理：

Node.js 實現的灰度釋出系統產生 OOM 問題排查過程

這樣就解釋了為什麼壓測過程中，請求會超時了。

而且由於 Proxy 生成了很多“請求”在排隊，所以記憶體也會快速地增長，這點可以透過 Node.js 的 inspect 功能進一步分析。

具體做法就是在啟動 Node.js 程序的時候加上 --inspect 引數，透過 fork 函式啟動的子程序可以使用 execArgv 來指定，如下所示：

const child1 = cp.fork('child.js', [], {
  env: {PORT: 3000},
  execArgv: ['--inspect=9999'],
})
const child2 = cp.fork('child.js', [], {
  env: {PORT: 3001},
  execArgv: ['--inspect=9998'],
})

然後，開啟 chrome 的除錯面板，點選 Node.js 的 DevTools，新增三個 connection 後就可以看到如下效果了：

Node.js 實現的灰度釋出系統產生 OOM 問題排查過程