Efficient-LLMAgent-Survey

We are currently writing a survey on Efficient LLM Agent Serving and welcome everyone to provide comments on the list!

This repository maintains a curated list of papers related to Large Language Model Based Agents (LLM Agents), especially focusing on efficient serving methods for LLM Agents.

This paper list covers several main aspects of efficient serving methods for LLM Agents. Table of content:

Efficient-LLMAgent-Survey

What is LLM Agent

LLM Powered Autonomous Agents

Efficient Serving LLM Agent

LLM Serving

RelayAttention for Efficient Large Language Model Serving with Long System Prompts
[ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching]
Splitwise: Efficient generative LLM inference using phase splitting|ISCA'24
[MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition]
A Hardware Evaluation Framework for Large Language Model Inference|ISCA'24
Efficient LLM Inference with Kcache
HeteGen HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices | MLSys'24
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving|SIGCOMM '24
Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network
[dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving]| OSDI '24
[NetLLM: Adapting Large Language Models for Networking] | SIGCOMM '24
[DistLLM: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving]| OSDI '24
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
POLCACharacterizing Power Management Opportunities for LLMs in the Cloud | ASPLOS'24
ScaleLLM: Unlocking Llama2-13B LLM Inference on Consumer GPU RTX 4090, powered by FEDML Nexus AI
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
SpotServe: Serving Generative Large Language Models on Preemptible Instances | ASPLOS 2024

Planning

PreActPreAct: Predicting Future in ReAct Enhances Agent's Planning Ability
An LLM Compiler for Parallel Function Calling
Dynamic Planning with a LLM

Tool and Action

Automatic and Efficient Customization of Neural Networks for ML Applications|OSDI ' 24
ToolChain^*ToolChain^: Efficient Action Space Navigation in Large Language Models with A Search | ICLR'24
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
Efficient Tool Use with Chain-of-Abstraction Reasoning
ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph
Budget-Constrained Tool Learning with Planning

Serverless

Serverless LLM ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models | OSDI'24
FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool Architecture | ASPLOS'24

Memory

Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
RET-LLM: Towards a General Read-Write Memory for Large Language Models
Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning | ACL'23
Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

Component Collaboration and Agent Framework

focus on improving the efficiency of data exchange and data transmission within AI agents: ---hongqiu

LLM Multi-Agent Systems: Challenges and Open Problems
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
AIOS: LLM Agent Operating System
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach
Gorilla: Large Language Model Connected with Massive APIs
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

Device-Edge-Cloud Collaboration

Hybrid LLM Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing | ICLR'24
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Optimal Caching and Model Multiplexing for Large Model Inference | NeurIPS'23
Optimising Calls to Large Language Models with Uncertainty Based Two-Tier Selection
Octopus: On-device language model for function calling of software APIs
Octopus v2: On-device language model for super agent | Stanford
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent| Stanford
Octopus v4: Graph of language models

LLM and Agent Framework

LLM Framework

from here

	Efficient Training	Efficient Inference	Efficient Fine-Tuning
DeepSpeed [Code]	✅	✅	✅
Megatron [Code]	✅	✅	✅
Alpa [Code]	✅	✅	✅
ColossalAI [Code]	✅	✅	✅
FairScale [Code]	✅	✅	✅
Pax [Code]	✅	✅	✅
Composer [Code]	✅	✅	✅
vLLM [Code]	❌	✅	❌
TensorRT-LLM [Code]	❌	✅	❌
LightLLM [Code]	❌	✅	❌
OpenLLM [Code]	❌	✅	✅
Ray-LLM [Code]	❌	✅	❌
MLC-LLM [Code]	❌	✅	❌
Sax [Code]	❌	✅	❌
Mosec [Code]	❌	✅	❌
LLM-Foundry [Code]	✅	✅	❌

GenAI Develop Engine

dify
fedml

Agent Framework

Auto-GPT
LangChain
AutoGen
Camel
HuggingGPT
GPT Engineer
BabyAGI
Al Town
GPTeam
ChatArena
AgentVerse

Benchmark, Trace, and Dataset

BurstGPTTowards Efficient and Reliable LLM Serving: A Real-World Workload Study

LLM and Agent on Mobile Platform

Mobile LLM MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | Meta
BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models
LLM as a System Service on Mobile Devices https://arxiv.org/abs/2403.11805

Survey Papers

A Survey on Effective Invocation Methods of Massive LLM Services
Personal llm agents: Insights and survey about the capability, efficiency and security
LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead
CASIT: Collective Intelligent Agent System for Internet of Things
Understanding the Weakness of Large Language Model Agents within a Complex Android Environment
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
Awesome MobileLLM
Understanding the planning of LLM agents: A survey
Awesome-LLM-Inference

Others

Berkeley Function-Calling Leaderboard
A Primer on the Inner Workings of Transformer-based Language Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Efficient-LLMAgent-Survey

What is LLM Agent

Efficient Serving LLM Agent

LLM Serving

Planning

Tool and Action

Serverless

Memory

Component Collaboration and Agent Framework

Device-Edge-Cloud Collaboration

LLM and Agent Framework

LLM Framework

GenAI Develop Engine

Agent Framework

Benchmark, Trace, and Dataset

LLM and Agent on Mobile Platform

Survey Papers

Others

Files

README.md

Latest commit

History

README.md

File metadata and controls

Efficient-LLMAgent-Survey

What is LLM Agent

Efficient Serving LLM Agent

LLM Serving

Planning

Tool and Action

Serverless

Memory

Component Collaboration and Agent Framework

Device-Edge-Cloud Collaboration

LLM and Agent Framework

LLM Framework

GenAI Develop Engine

Agent Framework

Benchmark, Trace, and Dataset

LLM and Agent on Mobile Platform

Survey Papers

Others