This is an simplified implementation of the paper LogMine: Fast Pattern Recognition for Log Analytics. The idea is to use a distance function to calculate a distance between to log line and group them into clusters.
npm install logmining
import { Cluster, clustering, ILog, Token, TokenType } from "logmining";
const logs:Ilog[] = ...
const clusters = clustering(logs);
//view clusters
Query kustro:
database('vscode-ext-aggregate').table('teamsfx_all')
| where ExtensionName == "ms-teams-vscode-extension"
| where ServerTimestamp >= datetime(2021-6-28)
| extend event = trim_start("ms-teams-vscode-extension/", EventName)
| extend component = tostring(Properties["component"])
| extend success = tostring(Properties["success"])
| extend appid = tostring(Properties["appid"])
| extend correlationId = tostring(Properties["correlation-id"])
| extend resources = tostring(Properties["resources"])
| extend errorType = tostring(Properties["error-type"])
| extend errorCode = tostring(Properties["error-code"])
| extend errorMsg = tostring(Properties["error-message"])
| project ServerTimestamp, version=ExtensionVersion, event, component, success, errorType, errorCode, errorMsg, machineId=VSCodeMachineId, correlationId
| where success == "no"
| where errorType == "system"
| where version matches regex "^2.6.0$"
Export data in excel format:
Run the clustering program on you exported excel data:
npm install
npm run build
node .\dist\processErrorMsg.js <error excel file path>
The clustering results have two files in the same folder of input excel file: one html file and one json file:
The html file is a list of clusters (order by the size of cluster):
The json file is the json data of clusters, including some basic statistics of clusters: