Skip to content

Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study.

License

Notifications You must be signed in to change notification settings

thalesbertaglia/instagram-disclosure-trends

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study

Description

This repository contains scripts and utilities for experiments related to the paper "Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study". The main script is designed to collect and process data from Instagram using the CrowdTangle API.

Prerequisites

  1. Python: The project requires Python 3.9 or higher.

  2. CrowdTangle API Token: An API token for CrowdTangle is required to fetch data. This token should be set in a .env file in the root directory of the project, under the key API_TOKEN.

    Example:

    API_TOKEN="YOUR_API_TOKEN_HERE"
    

CSV File Format: dataset_accounts.csv

This file serves as the metadata input for your Instagram data collection and processing. Each row represents an individual Instagram account's metadata. The CSV consists of the following columns:

  1. username (required): The Instagram handle or username of the account.

  2. country (optional): A country code that represents the primary audience or location of the account.

  3. size(optional): The categorization of the account based on its following size (e.g. micro or mega)

  4. number_of_posts (optional): The total number of posts made by the account up to the last date of collection.

  5. followers_collection_time (optional): The follower count of the account at data collection time.

  6. first_post (required): The earliest date and time (in the format 'YYYY-MM-DD HH:MM:SS') from which posts should be collected for the respective account

  7. last_post (required): The latest date and time (in the format 'YYYY-MM-DD HH:MM:SS') until which posts should be collected for the respective account.

Example:

username,country,size,number_of_posts,followers_collection_time,first_post,last_post
ab_bowen,US,mega,3652,1626210,2013-06-24 14:01:12,2022-09-15 06:32:49
achrafhakimi,DE,mega,612,10162234,2014-01-19 19:49:42,2022-09-16 12:39:47

Setup & Installation

  1. Clone the repository:

    git clone https://github.com/thalesbertaglia/instagram-disclosure-trends
    cd instagram-disclosure-trends
  2. Install dependencies using Poetry:

    poetry install
  3. Activate the virtual environment:

    poetry shell

Usage

To run the main script:

python scripts/collect_data.py [OPTIONS]

Options:

  • --csv_path: Path to the dataset_accounts.csv file. Default is data/dataset_accounts.csv.
  • --skip_collection: If passed, data collection will be skipped.
  • --skip_create_df: If passed, processing the raw CrowdTangle data into a DataFrame will be skipped.
  • --skip_augmentation: If passed, augmenting the DataFrame with additional columns will be skipped. Use this option for collecting data from new accounts not included in the original dataset.
  • --post_df_path: Path to the processed posts df pickle file. Default is data/df_posts.pkl.
  • --profile_df_path: Path to the processed profiles df pickle file. Default is data/df_profiles.pkl.

Troubleshooting

  • Ensure that the .env file exists in the root directory with the correct API_TOKEN.
  • Verify that the CSV file provided contains all the necessary columns.

License

MIT

Contact

For any queries or issues, please contact Thales Bertaglia.

Releases

No releases published

Packages

No packages published