diff --git a/chapters.json b/chapters.json index b8da9f54..30c5be8a 100644 --- a/chapters.json +++ b/chapters.json @@ -34,6 +34,7 @@ "drake": "Appendix" }, "draft_chapter_ids": [ - "belief" + "belief", + "imitation" ] } diff --git a/htmlbook b/htmlbook index ebfd9bf4..d2e682ed 160000 --- a/htmlbook +++ b/htmlbook @@ -1 +1 @@ -Subproject commit ebfd9bf495d1614472c96c26cbf1c0907f4ef4e3 +Subproject commit d2e682ed1bf4806c22e94a3a9cdd42f5fd55c8f7 diff --git a/imitation.html b/imitation.html new file mode 100644 index 00000000..8e46e873 --- /dev/null +++ b/imitation.html @@ -0,0 +1,127 @@ + + + + + + Ch. DRAFT - Imitation Learning + + + + + + + + + + + + + + + + + + + + +
+
+

Underactuated Robotics

+

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

+

Russ Tedrake

+

+ © Russ Tedrake, 2023
+ Last modified .
+ + How to cite these notes, use annotations, and give feedback.
+

+
+
+ +

Note: These are working notes used for a course being taught +at MIT. They will be updated throughout the Spring 2023 semester. Lecture videos are available on YouTube.

+ + + + + +
Table of contents
+ + + +

Imitation Learning

+ +

Two dominant approaches to imitation learning are behavioral cloning and inverse reinforcement learning... +

+ +

Diffusion Policy

+ +

One particularly successful form of behavior cloning for visuomotor + policies with continuous action spaces is the Diffusion Policy + Chi23. The dexterous manipulation team at TRI had been working + on behavior cloning for some time, but the Diffusion Policy (which started + as a summer intern project!) architecture has allowed us to very reliably + train incredibly + dexterous tasks and really start to scale up our ambitions for + manipulation.

+ +

Diffusion Policy for LQG

+ +

Let me be clear, it almost certainly does not make sense to use + a diffusion policy to implement LQG control. But because we understand + LQG so well at this point, it can be helpful to understand what the + Diffusion Policy looks like in this extremely simplified case.

+ +

Consider the case where we have the standard linear-Gaussian dynamical + system: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n], \\ + \by[n] = \bC\bx[n] + \bD\bu[n] + \bv[n], \\ \bw[n] \sim \mathcal{N}({\bf + 0}, {\bf \Sigma}_w), \quad \bv[n] \sim \mathcal{N}({\bf 0}, {\bf + \Sigma}_v). \end{gather*} Imagine that we create a dataset by rolling out + trajectory demonstrations using the optimal LQG policy. The question is: + what (exactly) does the diffusion policy learn?

+ +
+ +
+ +
+ + +

References

+
    + +
  1. +Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song, +"Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", +Proceedings of Robotics: Science and Systems , 2023. + +

  2. +
+

+

+ + + + + +
Table of contents
+ + + + + + diff --git a/output_feedback.html b/output_feedback.html index 86350b18..5fe787fd 100644 --- a/output_feedback.html +++ b/output_feedback.html @@ -423,7 +423,7 @@

Convex reparameterizations of $H_2$, $H_\infty$, and LQG

  • Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song, "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion", -arXiv preprint arXiv:2303.04137, 2023. +Proceedings of Robotics: Science and Systems , 2023.