From c05354d3f3f39b57c1c9f6b408b1127147279f03 Mon Sep 17 00:00:00 2001 From: michele-milesi <74559684+michele-milesi@users.noreply.github.com> Date: Tue, 12 Dec 2023 11:53:09 +0100 Subject: [PATCH] fix: code copy button (#20) --- _posts/2023-05-16-functionality-checks.md | 2 ++ _posts/2023-05-17-welcome.md | 4 ++-- _posts/2023-07-06-dreamer_v2.md | 4 ++++ _posts/2023-08-10-dreamer_v3.md | 4 ++++ assets/js/main.js | 11 ++++++++++- 5 files changed, 22 insertions(+), 3 deletions(-) diff --git a/_posts/2023-05-16-functionality-checks.md b/_posts/2023-05-16-functionality-checks.md index 6905b67..069d36b 100644 --- a/_posts/2023-05-16-functionality-checks.md +++ b/_posts/2023-05-16-functionality-checks.md @@ -14,6 +14,7 @@ subclass: 'post' --- # Available Functionalities +
+
+
git clone https://github.com/Eclectic-Sheep/sheeprl.git
cd sheeprl
python3.10 -m venv .venv
diff --git a/_posts/2023-07-06-dreamer_v2.md b/_posts/2023-07-06-dreamer_v2.md
index 43a6436..9c69513 100644
--- a/_posts/2023-07-06-dreamer_v2.md
+++ b/_posts/2023-07-06-dreamer_v2.md
@@ -93,6 +93,8 @@ Our PyTorch implementation aims to be a simple, scalable and well-documented rep
As an example, the implementation of the *KL balancing* directly follows the equation above:
+
+
```python
from torch.distributions import Independent, OneHotCategoricalStraightThrough
@@ -108,6 +110,8 @@ rhs = kl_divergence(
kl_loss = alpha * lhs + (1 - alpha) * rhs
```
+
+
Do you want to know more about how we implemented Dreamer-V2? Check out [our implementation](https://github.com/Eclectic-Sheep/sheeprl/tree/main/sheeprl/algos/dreamer_v2){:target="_blank"}.
### References
diff --git a/_posts/2023-08-10-dreamer_v3.md b/_posts/2023-08-10-dreamer_v3.md
index b450980..da236f7 100644
--- a/_posts/2023-08-10-dreamer_v3.md
+++ b/_posts/2023-08-10-dreamer_v3.md
@@ -68,6 +68,8 @@ $$
#### Uniform Mix
To prevent spikes in the KL loss, the categorical distributions (the one for discrete actions and the one for the posteriors/priors) are parametrized as mixtures of $1\%$ uniform and $99\%$ neural network output. This avoid the distributions to become near deterministic. To implement the *uniform mix*, we applied the *uniform mix* function to the logits returned by the neural networks.
+>
+
```python
import torch
from torch import Tensor
@@ -86,6 +88,8 @@ def uniform_mix(self, logits: Tensor, unimix: float = 0.01) -> Tensor:
return logits
```
+
+
#### Return regularizer for the policy
The main difficulty in Dreamer-V2 *actor learning phase* is the choosing of the entropy regularizer, which heavily depends on the scale and the frequency of the rewards. To have a single entropy coefficient, it is necessary to normalize the returns using moving statistics. In particular, they found out that it is more convenient to scale down large rewards and not scale up small rewards, to avoid adding noise.
diff --git a/assets/js/main.js b/assets/js/main.js
index 8e6ddce..d7c0e15 100644
--- a/assets/js/main.js
+++ b/assets/js/main.js
@@ -49,7 +49,8 @@ $(document).ready(function () {
});
// Document Ctrl + C
- const sources = document.querySelectorAll("code:not(.with-new-line)");
+ const sources = document.querySelectorAll(":not(.with-new-line) code");
+ const sources_new_line = document.querySelectorAll(".with-new-line code");
sources.forEach(source => {
source.addEventListener("copy", (event) => {
@@ -58,4 +59,12 @@ $(document).ready(function () {
event.preventDefault();
});
});
+
+ sources_new_line.forEach(source => {
+ source.addEventListener("copy", (event) => {
+ const selection = document.getSelection();
+ event.clipboardData.setData("text/plain", selection.toString());
+ event.preventDefault();
+ });
+ });
});
\ No newline at end of file