From e344552571e852d0c60bc5399bb9b2c8e0fed19d Mon Sep 17 00:00:00 2001 From: Russ Tedrake Date: Fri, 29 Sep 2023 08:24:26 -0400 Subject: [PATCH] placeholder for notes on diffusion policy for LQG --- chapters.json | 3 +- htmlbook | 2 +- imitation.html | 127 +++++++++++++++++++++++++++++++++++++++++++ output_feedback.html | 2 +- 4 files changed, 131 insertions(+), 3 deletions(-) create mode 100644 imitation.html diff --git a/chapters.json b/chapters.json index b8da9f54..30c5be8a 100644 --- a/chapters.json +++ b/chapters.json @@ -34,6 +34,7 @@ "drake": "Appendix" }, "draft_chapter_ids": [ - "belief" + "belief", + "imitation" ] } diff --git a/htmlbook b/htmlbook index ebfd9bf4..d2e682ed 160000 --- a/htmlbook +++ b/htmlbook @@ -1 +1 @@ -Subproject commit ebfd9bf495d1614472c96c26cbf1c0907f4ef4e3 +Subproject commit d2e682ed1bf4806c22e94a3a9cdd42f5fd55c8f7 diff --git a/imitation.html b/imitation.html new file mode 100644 index 00000000..1a1fa632 --- /dev/null +++ b/imitation.html @@ -0,0 +1,127 @@ + + + + + + Ch. DRAFT - Imitation Learning + + + + + + + + + + + + + + + + + + + + +
+
+

Underactuated Robotics

+

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

+

Russ Tedrake

+

+ © Russ Tedrake, 2023
+ Last modified .
+ + How to cite these notes, use annotations, and give feedback.
+

+
+
+ +

Note: These are working notes used for a course being taught +at MIT. They will be updated throughout the Spring 2023 semester. Lecture videos are available on YouTube.

+ + + + + +
Table of contents
+ + + +

Imitation Learning

+ +

Two dominant approaches to imitation learning are behavioral cloning and inverse reinforcement learning... +

+ +

Diffusion Policy

+ +

One particularly successful form of behavior cloning for visuomotor + policies with continuous action spaces is the Diffusion Policy + Chi23. The dexterous manipulation team at TRI had been working + on behavior cloning for some time, but the Diffusion Policy (which started + as a summer intern project!) architecture has allowed us to very reliably + train incredibly + dexterous tasks and really start to scale up our ambitions for + manipulation.

+ +

Diffusion Policy for LQG

+ +

Let me be clear, it almost certainly does not make sense to use + a diffusion policy to implement LQR control. But because we understand + LQG so well at this point, it can be helpful to understand what the + Diffusion Policy looks like in this extremely simplified case.

+ +

Consider the case where we have the standard linear-Gaussian dynamical + system: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n], \\ + \by[n] = \bC\bx[n] + \bD\bu[n] + \bv[n], \\ \bw[n] \sim \mathcal{N}({\bf + 0}, {\bf \Sigma}_w), \quad \bv[n] \sim \mathcal{N}({\bf 0}, {\bf + \Sigma}_v). \end{gather*} Imagine that we create a dataset by rolling out + trajectory demonstrations using the optimal LQG policy. The question is: + what (exactly) does the diffusion policy learn?

+ +
+ +
+ +
+ + +

References

+
    + +
  1. +Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song, +"Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", +Proceedings of Robotics: Science and Systems , 2023. + +

  2. +
+

+

+ + + + + +
Table of contents
+ + + + + + diff --git a/output_feedback.html b/output_feedback.html index 86350b18..5fe787fd 100644 --- a/output_feedback.html +++ b/output_feedback.html @@ -423,7 +423,7 @@

Convex reparameterizations of $H_2$, $H_\infty$, and LQG

  • Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song, "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", -arXiv preprint arXiv:2303.04137, 2023. +Proceedings of Robotics: Science and Systems , 2023.