Skip to content

Commit

Permalink
add student blogs
Browse files Browse the repository at this point in the history
  • Loading branch information
jocelynshen committed Nov 30, 2023
1 parent 714242a commit 32de314
Show file tree
Hide file tree
Showing 249 changed files with 12,305 additions and 0 deletions.
104 changes: 104 additions & 0 deletions _posts/2022-11-09-how-cnns-learn-shapes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
layout: distill
title: How CNNs learn shapes
description:
date: 2023-11-09
htmlwidgets: true

# Anonymize when submitting
# authors:
# - name: Anonymous

authors:
- name: Chloe Hong
url:
affiliations:
name: MIT

# must be the exact same name as your blogpost
bibliography: 2022-11-09-how-cnns-learn-shapes.bib

# Add a table of contents to your post.
# - make sure that TOC names match the actual section names
# for hyperlinks within the post to work correctly.
toc:
- name : Background
# - name: Equations
# - name: Images and Figures
# subsections:
# - name: Interactive Figures
# - name: Citations
# - name: Footnotes
# - name: Code Blocks
# - name: Layouts
# - name: Other Typography?

# Below is an example of injecting additional post-specific styles.
# This is used in the 'Layouts' section of this post.
# If you use this post as a template, delete this _styles block.
_styles: >
.fake-img {
background: #bbb;
border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
margin-bottom: 12px;
}
.fake-img p {
font-family: monospace;
color: white;
text-align: left;
margin: 12px 0;
text-align: center;
font-size: 16px;
}
---

## Background

One widely accepted intuition is that CNNs combines low-level features (e.g. edges) to gradually learn more complex and abstracted shapes to detect objects while being invariant to positional and translation.

> As [@kriegeskorte2015deep] puts it, “the network acquires complex knowledge
about the kinds of shapes associated with each category. [...] High-level units appear to learn
representations of shapes occurring in natural images” (p. 429). This notion also appears in other
explanations, such as in [@lecun2015deep]: Intermediate CNN layers recognise “parts of familiar
objects, and subsequent layers [...] detect objects as combinations of these parts” (p. 436). We term
this explanation the shape hypothesis.
As a result, the final prediction is based on global patterns rather than local features.

However, there has been contradictory findings that CNNs trained on off-the-shelf datasets are biased towards predicting the category corresponding to the texture rather than shape. [@geirhos2018imagenet]

{% raw %}{% include figure.html path="assets/img/2023-11-09-how-cnns-learn-shapes/shapetexture.png" class="img-fluid" %}{% endraw %}

Going further, previous works have suggested ways to increase the shape bias of CNNs including data augmentation and relabelling.
While these works have successfully shown the discriminative bias of CNNs toward certain features, they do not identify how the networks "perception" changes.
With this project, I seek to evaluate the bias contained (i) in the latent representations, and (ii) on a per-pixel level.



## Methods
I choose two approaches from [@geirhos2018imagenet] and [@chung2022shape] that augment the dataset to achieve an increased shape bias in CNNs.
To gain a better understanding what type of shape information contained in the network is discriminative, where shape information is encoded, as well as when the network learns about object shape during training, I use an optimization method to visualize features learned at each layer of the trained models.
By comparing the original model to the augmented version, and across different augmentation methods, we can evaluate if there is a common pattern in the way CNNs learns shapes and what additional information is most effective in increasing shape bias in CNNs.

### Data augmentations
[@geirhos2018imagenet] increased shape bias by augmenting the data with shape-based representations.

| Features | Dataset |
|---------------|---------------------------------------|
| image | ImageNet |
| image + shape | ImageNet augmented with line drawings |
| shape | Line drawings |

[@chung2022shape] speculates data distribution is the root cause of discriminative biases in CNNs. To address this, they suggested a granular labeling scheme that redesigns the label space to pursue a balance between texture and shape biases.

| Labels | Dataset |
|---------------|---------------------------------------|
| categorical | ImageNet |
| categorical + style | ImageNet |


### CNN feature visualization
We visualize features that are understood by the CNN model at the layer level using the following optimization framework.

{% raw %}{% include figure.html path="assets/img/2023-11-09-how-cnns-learn-shapes/cnnfeaturevisualization.png" class="img-fluid" %}{% endraw %}

93 changes: 93 additions & 0 deletions _posts/2023-11-01-Symmetry-Optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
layout: distill
title: Investigating the Impact of Symmetric Optimization Algorithms on Learnability
description: Recent theoretical papers in machine learning have raised concerns about the impact of symmetric optimization algorithms on learnability, citing hardness results from theoretical computer science. This project aims to empirically investigate and validate these theoretical claims by designing and conducting experiments at scale. Understanding the role of optimization algorithms in the learning process is crucial for advancing the field of machine learning.
date: 2023-11-09
htmlwidgets: true

# Anonymize when submitting
# authors:
# - name: Anonymous

authors:
- name: Kartikesh Mishra
url: ""
affiliations:
name: MIT
- name: Divya P Shyamal
url: ""
affiliations:
name: MIT

# must be the exact same name as your blogpost
bibliography: 2023-11-01-Symmetry-Optimization.bib

# Add a table of contents to your post.
# - make sure that TOC names match the actual section names
# for hyperlinks within the post to work correctly.
toc:
- name: Introduction
- name: Experimental design
subsections:
- name: Learning Tasks and Datasets
- name: Learning Algorithms
- name: Evaluation Metrics

# Below is an example of injecting additional post-specific styles.
# This is used in the 'Layouts' section of this post.
# If you use this post as a template, delete this _styles block.
_styles: >
.fake-img {
background: #bbb;
border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
margin-bottom: 12px;
}
.fake-img p {
font-family: monospace;
color: white;
text-align: left;
margin: 12px 0;
text-align: center;
font-size: 16px;
}
---

## Introductions

In practice, the majority of machine learning algorithms exhibit symmetry. Our objective is to explore the impact of introducing asymmetry to different components of a machine learning algorithm, such as architecture, loss function, or optimization, and assess whether this asymmetry enhances overall performance.

Andrew Ng's research <d-cite key="ng2004feature"></d-cite> (https://icml.cc/Conferences/2004/proceedings/papers/354.pdf) suggests that in scenarios requiring feature selection, employing asymmetric (or more precisely, non-rotationally invariant) algorithms can result in lower sample complexity. For instance, in the context of regularized logistic regression, the sample complexity with the L1 norm is O(log n), while with the L2 norm, it is O(n). This insight underscores the potential benefits of incorporating asymmetry, particularly in tasks involving feature selection, to achieve improved learning outcomes. Can asymmetry be more advantageous in other learning tasks? What are the costs associated with using symmetric or asymmetric learning algorithms?

## Experimental Design

Our experiments will proceed as follows. We will have a set of datasets and a set of learning algorithms (both symmetric and asymmetric) from which we will generate models and test them on validation datasets from the same distribution on which they were trained. We will analyze the learning process as well as the performance of these learned models.

### Learning Tasks and Datasets

We plan to use MNIST, CIFAR-100, IRIS Datasets like Banknote Dataset, and a subset of ImageNet. If we complete our training on the image datasets, we may include some text-based datasets from Kaggle. Using these datasets, we plan to analyze several learning tasks: classification, regression, feature selection, and reconstruction.

### Learning Algorithms

We define a gradient descent parametric learning algorithm to be symmetric if it uses the same function to update each parameter value. Currently, we are considering using CNN models with varying numbers of convolution layers, VisTransformers with varying numbers of attention blocks, and MultiLayer Perceptron with varying depths of the network. We will use dropout, skip connections, variation in activation functions, and initialization across layers to introduce asymmetry in the architecture. We will use cross-entropy and MSE Loss functions as asymmetric and symmetric loss functions. For our optimizers, we will use Batch Gradient Descent, Stochastic Gradient Descent, and Adam algorithms, and to introduce asymmetry, we will vary the learning rates, momentum, and weight decay across parameters.

For our initial tests, we plan to compare a few pairs of multi-layer perceptions on the MNIST dataset. Each pair is described in detail below.

- 3-layer perceptron with l as learning rate vs 3-layer perceptron with each layer k having lk learning rates
- 4-layer perceptron vs 4-layer perceptron where some neurons on the 2nd layer skip to the 4th layer directly


## Evaluation Metrics

We will evaluate the trained models using the following metrics and compare the models generated from symmetric algorithms with those from asymmetric algorithms on the same dataset.
- validation accuracy
- Percentage of correct classifications
- negative mean square error for regression and reconstruction
- k-fold cross validation accuracy
- accuracy on perturbed dataset (we will use guassian noise)
- convergence speed during training

## Compute Resources

We plan to use Google Collab for our initial experiments and then use MIT Supercloud for training and
inference on large models.
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
layout: distill
title: Visualization of CLIP's Learning and Perceiving Dynamics
description: This project aims to develop methods and tools to enhance the interpretability of AI systems, focusing on how these systems make decisions and predictions. By creating more transparent AI models, the research seeks to bridge the communication gap between humans and AI, fostering trust and efficiency in various applications, from healthcare to autonomous driving. Such advancements would not only demystify AI operations for non-experts but also aid in the ethical and responsible development of AI technologies.
date: 2023-11-01
htmlwidgets: true

# Anonymize when submitting
# authors:
# - name: Anonymous

authors:
- name: Chi-Li Cheng
url: "https://chilicheng.com"
affiliations:
name: Massachusetts Institute of Technology

# must be the exact same name as your blogpost
bibliography: 2023-11-01-Visualization of CLIP's Learning and Perceiving Dynamics.bib

# Add a table of contents to your post.
# - make sure that TOC names match the actual section names
# for hyperlinks within the post to work correctly.
toc:
- name: Project Proposal
subsections:
- name: Abstract
- name: Introduction
- name: Methodology
- name: Potential Contributions

# Below is an example of injecting additional post-specific styles.
# This is used in the 'Layouts' section of this post.
# If you use this post as a template, delete this _styles block.
_styles: >
.fake-img {
background: #bbb;
border: 1px solid rgba(0, 0, 0, 0.1);
box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1);
margin-bottom: 12px;
}
.fake-img p {
font-family: monospace;
color: white;
text-align: left;
margin: 12px 0;
text-align: center;
font-size: 16px;
}
---

## Project Proposal
In this project, I delve into the intricate capabilities of the CLIP (Contrastive Language–Image Pre-training) model<d-cite key="radford2021learning"></d-cite>, renowned for its human-like ability to process both visual and textual data. Central to my research is the belief that visualization plays a crucial role in understanding complex AI systems. With this in mind, I have set two primary objectives: first, to develop innovative visualization techniques that can provide a deeper, more intuitive understanding of CLIP's learning and perception processes; and second, to analyze how the CLIP model dynamically processes sequential images or videos, focusing on visualizing and interpreting the flow field during training and the trajectory characteristics during video content processing.


### Introduction

The CLIP model, which stands for Contrastive Language–Image Pre-training, represents a groundbreaking approach in integrating visual and textual data within the realm of artificial intelligence. In my project, I undertake an in-depth exploration of this model through a two-fold approach. Initially, my focus is on developing advanced visualization techniques that are tailored to decode and highlight the intricate learning and perception mechanisms at the core of CLIP. This inspired by a detailed investigations<d-cite key="wang2020understanding"></d-cite> <d-cite key="shi2023understanding"></d-cite> <d-cite key="zhao2017exact"></d-cite>into the behavior of features on the unit sphere, offering a unique and insightful understanding of the model's operations.

Furthermore, this research extends to a thorough analysis of how the CLIP model processes sequential visual content, with a specific focus on video data. This part of my study goes beyond merely visualizing the model's feature embeddings; it involves a meticulous examination of its dynamic interpretive behaviors. By emphasizing innovative visualization methods, my aim is to demystify the complex and often abstract functionalities of the CLIP model, making these processes more accessible and understandable.

In essence, my project seeks to bridge the gap between the sophisticated computational processes of the CLIP model and our comprehension of these processes. By focusing on groundbreaking visualization techniques, I aspire to deepen our understanding of AI's learning behaviors, thereby contributing significantly to the advancement of artificial intelligence research.

### Method

The project involves several key methodologies:

Innovative Visualization of CLIP's Feature Embeddings: Developing intuitive visual representations of CLIP's embeddings on a hypersphere to demystify high-dimensional data processing and understand the model's predictive mechanisms.

Analyzing Factors Influencing CLIP’s Learning: Examining the impact of pretrained data quality and training dataset composition on CLIP’s learning efficacy.

Visualizing Dynamic Behavior with Sequential Images: Focusing on visualizing CLIP's processing of videos to observe learning patterns and trajectory characteristics, including the creation of a specialized interface for 3D visualization.

Experimental Analysis with Movie Clips: Testing various movie clips to explore if trajectory patterns can reveal video themes or genres, and understanding the correlation between these trajectories and cinematic content.


### Potential Contributions

The research is poised to offer significant contributions:

Enhanced Understanding of CLIP’s Learning Dynamics: Insights into how data quality and dataset composition influence CLIP's learning process.

Evaluating Training Dataset Quality: Providing valuable information on the effectiveness of training datasets, potentially guiding data selection and preparation strategies.

Semantic Trajectory Analysis in Video Content: New insights into CLIP's semantic interpretations of dynamic content, including the evolution of model perception and the formation of 'data islands'.

Implications for Model Training and Content Analysis: The findings could lead to improved training methods for CLIP and similar models, as well as novel methods for content analysis in understanding cinematic themes and narrative structures.
Loading

0 comments on commit 32de314

Please sign in to comment.