diff --git a/chapters.json b/chapters.json index b8da9f54..30c5be8a 100644 --- a/chapters.json +++ b/chapters.json @@ -34,6 +34,7 @@ "drake": "Appendix" }, "draft_chapter_ids": [ - "belief" + "belief", + "imitation" ] } diff --git a/htmlbook b/htmlbook index ebfd9bf4..d2e682ed 160000 --- a/htmlbook +++ b/htmlbook @@ -1 +1 @@ -Subproject commit ebfd9bf495d1614472c96c26cbf1c0907f4ef4e3 +Subproject commit d2e682ed1bf4806c22e94a3a9cdd42f5fd55c8f7 diff --git a/imitation.html b/imitation.html new file mode 100644 index 00000000..8e46e873 --- /dev/null +++ b/imitation.html @@ -0,0 +1,127 @@ + + + + +
+Algorithms for Walking, Running, Swimming, Flying, and Manipulation
+ +
+ © Russ Tedrake, 2023
+ Last modified .
+
+ How to cite these notes, use annotations, and give feedback.
+
Note: These are working notes used for a course being taught +at MIT. They will be updated throughout the Spring 2023 semester. Lecture videos are available on YouTube.
+ ++ | Table of contents | ++ |
Two dominant approaches to imitation learning are behavioral cloning and inverse reinforcement learning... +
+ +One particularly successful form of behavior cloning for visuomotor
+ policies with continuous action spaces is the Diffusion Policy
+
Let me be clear, it almost certainly does not make sense to use + a diffusion policy to implement LQG control. But because we understand + LQG so well at this point, it can be helpful to understand what the + Diffusion Policy looks like in this extremely simplified case.
+ +Consider the case where we have the standard linear-Gaussian dynamical + system: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n], \\ + \by[n] = \bC\bx[n] + \bD\bu[n] + \bv[n], \\ \bw[n] \sim \mathcal{N}({\bf + 0}, {\bf \Sigma}_w), \quad \bv[n] \sim \mathcal{N}({\bf 0}, {\bf + \Sigma}_v). \end{gather*} Imagine that we create a dataset by rolling out + trajectory demonstrations using the optimal LQG policy. The question is: + what (exactly) does the diffusion policy learn?
+ ++ | Table of contents | ++ |