diff --git a/contents/benchmarking/benchmarking.qmd b/contents/benchmarking/benchmarking.qmd
index acd04e57..2a149fbd 100644
--- a/contents/benchmarking/benchmarking.qmd
+++ b/contents/benchmarking/benchmarking.qmd
@@ -563,7 +563,7 @@ The [Common Objects in Context (COCO) dataset][https://cocodataset.org/](@lin201
 
 #### GPT-3 (2020)
 
-While the above examples primarily focus on image datasets, there have been significant developments in text datasets as well. One notable example is GPT-3[@brown2020language], developed by OpenAI. GPT-3 is a language model trained on a diverse range of internet text. Although the dataset used to train GPT-3 is not publicly available, the model itself, consisting of 175 billion parameters, is a testament to the scale and complexity of modern machine learning datasets and models.
+While the above examples primarily focus on image datasets, there have been significant developments in text datasets as well. One notable example is GPT-3 [@brown2020language], developed by OpenAI. GPT-3 is a language model trained on a diverse range of internet text. Although the dataset used to train GPT-3 is not publicly available, the model itself, consisting of 175 billion parameters, is a testament to the scale and complexity of modern machine learning datasets and models.
 
 #### Present and Future
 
@@ -667,7 +667,7 @@ As machine learning models become more sophisticated, so do the benchmarks requi
 
 **Out-of-Distribution Generalization**: Testing how well models perform on data that is different from the original training distribution. This evaluates the model's ability to generalize to new, unseen data. Example benchmarks are Wilds [@koh2021wilds], RxRx, and ANC-Bench.
 
-**Adversarial Robustness:** Evaluating model performance under adversarial attacks or perturbations to the input data. This tests the model's robustness. Example benchmarks are ImageNet-A[@hendrycks2021natural], ImageNet-C[@xie2020adversarial], and CIFAR-10.1.
+**Adversarial Robustness:** Evaluating model performance under adversarial attacks or perturbations to the input data. This tests the model's robustness. Example benchmarks are ImageNet-A [@hendrycks2021natural], ImageNet-C [@xie2020adversarial], and CIFAR-10.1.
 
 **Real-World Performance:** Testing models on real-world datasets that closely match end tasks, rather than just canned benchmark datasets. Examples are medical imaging datasets for healthcare tasks or actual customer support chat logs for dialogue systems.