From 000bbc2b1e690faf65fbfd95ae452579c4d08cad Mon Sep 17 00:00:00 2001 From: jasonjabbour Date: Sun, 17 Nov 2024 23:50:11 -0500 Subject: [PATCH] previously defined terms --- .../privacy_security/privacy_security.qmd | 46 +++++++------------ 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/contents/core/privacy_security/privacy_security.qmd b/contents/core/privacy_security/privacy_security.qmd index 8e9cf073..7b7be76d 100644 --- a/contents/core/privacy_security/privacy_security.qmd +++ b/contents/core/privacy_security/privacy_security.qmd @@ -503,7 +503,7 @@ Here are some examples of TEEs that provide hardware-based security for sensitiv * **[Apple SecureEnclave](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web):** A TEE for biometric data and cryptographic key management on iPhones and iPads, facilitating secure mobile payments. -@fig-enclave is a diagram demonstrating a secure enclave isolated from the host processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. It also has a mechanism to store information securely on attached storage separate from the NAND flash storage used by the application processor and operating system. This design keeps sensitive user data secure even when the Application Processor kernel becomes compromised. +@fig-enclave is a diagram demonstrating a secure enclave isolated from the host processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. It also has a mechanism to store information securely on attached storage separate from the NAND flash storage used by the application processor and operating system. NAND flash is a type of non-volatile storage used in devices like SSDs, smartphones, and tablets to retain data even when powered off. By isolating sensitive data from the NAND storage accessed by the main system, this design ensures user data remains secure even if the application processor kernel is compromised. ![System-on-chip secure enclave. Source: [Apple.](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web)](images/png/System-on-chip_secure_enclave.png){#fig-enclave} @@ -533,11 +533,11 @@ While Trusted Execution Environments offer significant security benefits, their #### About -A Secure Boot is a fundamental security standard that ensures a device only boots using software trusted by the Original Equipment Manufacturer (OEM). During startup, the firmware checks the digital signature of each boot software component, including the bootloader, kernel, and base operating system. This process verifies that the software has not been altered or tampered with. If any signature fails verification, the boot process is halted to prevent unauthorized code execution that could compromise the system’s security integrity. +A Secure Boot is a fundamental security standard that ensures a device only boots using software trusted by the device manufacturer. During startup, the firmware checks the digital signature of each boot software component, including the bootloader, kernel, and base operating system. This process verifies that the software has not been altered or tampered with. If any signature fails verification, the boot process is halted to prevent unauthorized code execution that could compromise the system’s security integrity. #### Benefits -The integrity of an embedded machine learning (ML) system is paramount from the moment it is powered on. Any compromise in the boot process can lead to the execution of malicious software before the operating system and ML applications begin, resulting in manipulated ML operations, unauthorized data access, or repurposing the device for malicious activities such as botnets or crypto-mining. +The integrity of an embedded ML system is paramount from the moment it is powered on. Any compromise in the boot process can lead to the execution of malicious software before the operating system and ML applications begin, resulting in manipulated ML operations, unauthorized data access, or repurposing the device for malicious activities such as botnets or crypto-mining. Secure Boot offers vital protections for embedded ML hardware through the following critical mechanisms: @@ -557,15 +557,15 @@ Secure Boot works with TEEs to further enhance system security. @fig-secure-boot A real-world example of Secure Boot's application can be observed in Apple's Face ID technology, which uses advanced machine learning algorithms to enable [facial recognition](https://support.apple.com/en-us/102381) on iPhones and iPads. Face ID relies on a sophisticated integration of sensors and software to precisely map the geometry of a user's face. For Face ID to operate securely and protect users' biometric data, the device's operations must be trustworthy from initialization. This is where Secure Boot plays a pivotal role. The following outlines how Secure Boot functions in conjunction with Face ID: -**Initial Verification:** Upon booting up an iPhone, the Secure Boot process commences within the Secure Enclave, a specialized coprocessor designed to add an extra layer of security. The Secure Enclave handles biometric data, such as fingerprints for Touch ID and facial recognition data for Face ID. During the boot process, the system rigorously verifies that Apple has digitally signed the Secure Enclave's firmware, guaranteeing its authenticity. This verification step ensures that the firmware used to process biometric data remains secure and uncompromised. +1. **Initial Verification:** Upon booting up an iPhone, the Secure Boot process commences within the Secure Enclave, a specialized coprocessor designed to add an extra layer of security. The Secure Enclave handles biometric data, such as fingerprints for Touch ID and facial recognition data for Face ID. During the boot process, the system rigorously verifies that Apple has digitally signed the Secure Enclave's firmware, guaranteeing its authenticity. This verification step ensures that the firmware used to process biometric data remains secure and uncompromised. -**Continuous Security Checks:** Following the system's initialization and validation by Secure Boot, the Secure Enclave communicates with the device's central processor to maintain a secure boot chain. During this process, the digital signatures of the iOS kernel and other critical boot components are meticulously verified to ensure their integrity before proceeding. This "chain of trust" model effectively prevents unauthorized modifications to the bootloader and operating system, safeguarding the device's overall security. +2. **Continuous Security Checks:** Following the system's initialization and validation by Secure Boot, the Secure Enclave communicates with the device's central processor to maintain a secure boot chain. During this process, the digital signatures of the iOS kernel and other critical boot components are meticulously verified to ensure their integrity before proceeding. This "chain of trust" model effectively prevents unauthorized modifications to the bootloader and operating system, safeguarding the device's overall security. -**Face Data Processing:** Once the secure boot sequence is completed, the Secure Enclave interacts securely with the machine learning algorithms that power Face ID. Facial recognition involves projecting and analyzing over 30,000 invisible points to create a depth map of the user's face and an infrared image. This data is converted into a mathematical representation and is securely compared with the registered face data stored in the Secure Enclave. +3. **Face Data Processing:** Once the secure boot sequence is completed, the Secure Enclave interacts securely with the machine learning algorithms that power Face ID. Facial recognition involves projecting and analyzing over 30,000 invisible points to create a depth map of the user's face and an infrared image. This data is converted into a mathematical representation and is securely compared with the registered face data stored in the Secure Enclave. -**Secure Enclave and Data Protection:** The Secure Enclave is precisely engineered to protect sensitive data and manage cryptographic operations that safeguard this data. Even in the event of a compromised operating system kernel, the facial data processed through Face ID remains inaccessible to unauthorized applications or external attackers. Importantly, Face ID data is never transmitted off the device and is not stored on iCloud or other external servers. +4. **Secure Enclave and Data Protection:** The Secure Enclave is precisely engineered to protect sensitive data and manage cryptographic operations that safeguard this data. Even in the event of a compromised operating system kernel, the facial data processed through Face ID remains inaccessible to unauthorized applications or external attackers. Importantly, Face ID data is never transmitted off the device and is not stored on iCloud or other external servers. -**Firmware Updates:** Apple frequently releases updates to address security vulnerabilities and enhance system functionality. Secure Boot ensures that all firmware updates are authenticated, allowing only those signed by Apple to be installed. This process helps preserve the integrity and security of the Face ID system over time. +5. **Firmware Updates:** Apple frequently releases updates to address security vulnerabilities and enhance system functionality. Secure Boot ensures that all firmware updates are authenticated, allowing only those signed by Apple to be installed. This process helps preserve the integrity and security of the Face ID system over time. By integrating Secure Boot with dedicated hardware such as the Secure Enclave, Apple delivers robust security guarantees for critical operations like facial recognition. @@ -640,7 +640,7 @@ PUF key generation avoids external key storage, which risks exposure. It also pr #### Utility -Machine learning models are rapidly becoming a core part of the functionality for many embedded devices, such as smartphones, smart home assistants, and autonomous drones. However, securing ML on resource-constrained embedded hardware can be challenging. This is where physical unclonable functions (PUFs) come in uniquely handy. Let's look at some examples of how PUFs can be useful. +Machine learning models are rapidly becoming a core part of the functionality for many embedded devices, such as smartphones, smart home assistants, and autonomous drones. However, securing ML on resource-constrained embedded hardware can be challenging. This is where PUFs come in uniquely handy. Let's look at some examples of how PUFs can be useful. PUFs provide a way to generate unique fingerprints and cryptographic keys tied to the physical characteristics of each chip on the device. Let's take an example. We have a smart camera drone that uses embedded ML to track objects. A PUF integrated into the drone's processor could create a device-specific key to encrypt the ML model before loading it onto the drone. This way, even if an attacker somehow hacks the drone and tries to steal the model, they won't be able to use it on another device! @@ -726,17 +726,13 @@ Emerging techniques like differential Privacy, federated learning, and synthetic Methodologies like Privacy by Design [@cavoukian2009privacy] consider such minimization early in system architecture. Regulations like GDPR also mandate data minimization principles. With a multilayered approach across legal, technical, and process realms, data minimization limits risks in embedded ML products. -#### Case Study - Performance-Based Data Minimization +#### Case Study: Performance-Based Data Minimization Performance-based data minimization [@Biega2020Oper] focuses on expanding upon the third category of data minimization mentioned above, namely _limitation_. It specifically defines the robustness of model results on a given dataset by certain performance metrics, such that data should not be additionally collected if it does not significantly improve performance. Performance metrics can be divided into two categories: -1. Global data minimization performance +1. Global data minimization performance: Satisfied if a dataset minimizes the amount of per-user data while its mean performance across all data is comparable to the mean performance of the original, unminimized dataset. -a. Satisfied if a dataset minimizes the amount of per-user data while its mean performance across all data is comparable to the mean performance of the original, unminimized dataset. - -2. Per user data minimization performance - -a. Satisfied if a dataset minimizes the amount of per-user data while the minimum performance of individual user data is comparable to that of individual user data in the original, unminimized dataset. +2. Per user data minimization performance: Satisfied if a dataset minimizes the amount of per-user data while the minimum performance of individual user data is comparable to that of individual user data in the original, unminimized dataset. Performance-based data minimization can be leveraged in machine-learning settings, including movie recommendation algorithms and e-commerce settings. @@ -932,7 +928,7 @@ Deep learning models have previously been shown to be vulnerable to adversarial Homomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, generating an encrypted result that, when decrypted, matches the result of operations performed on the plaintext. For example, multiplying two numbers encrypted with homomorphic encryption produces an encrypted product that decrypts the actual product of the two numbers. This means that data can be processed in an encrypted form, and only the resulting output needs to be decrypted, significantly enhancing data security, especially for sensitive information. -Homomorphic encryption enables outsourced computation on encrypted data without exposing the data itself to the external party performing the operations. However, only certain computations like addition and multiplication are supported in partially homomorphic schemes. Fully homomorphic encryption (FHE) that can handle any computation is even more complex. The number of possible operations is limited before noise accumulation corrupts the ciphertext. +Homomorphic encryption enables outsourced computation on encrypted data without exposing the data itself to the external party performing the operations. However, only certain computations like addition and multiplication are supported in partially homomorphic schemes. Fully Homomorphic Encryption (FHE) that can handle any computation is even more complex. The number of possible operations is limited before noise accumulation corrupts the ciphertext. To use homomorphic encryption across different entities, carefully generated public keys must be exchanged for operations across separately encrypted data. This advanced encryption technique enables previously impossible secure computation paradigms but requires expertise to implement correctly for real-world systems. @@ -1044,21 +1040,11 @@ Ongoing MPC research closes this efficiency gap through cryptographic advances, #### Core Idea -Synthetic data generation has emerged as an important privacy-preserving machine learning approach that allows models to be developed and tested without exposing real user data. The key idea is to train generative models on real-world datasets and then sample from these models to synthesize artificial data that statistically matches the original data distribution but does not contain actual user information. For example, a GAN could be trained on a dataset of sensitive medical records to learn the underlying patterns and then used to sample synthetic patient data. - -The primary challenge of synthesizing data is to ensure adversaries cannot re-identify the original dataset. A simple approach to achieving synthetic data is adding noise to the original dataset, which still risks privacy leakage. When noise is added to data in the context of differential privacy, sophisticated mechanisms based on the data's sensitivity are used to calibrate the amount and distribution of noise. Through these mathematically rigorous frameworks, differential Privacy generally guarantees Privacy at some level, which is the primary goal of this privacy-preserving technique. Beyond preserving privacy, synthetic data combats multiple data availability issues such as imbalanced datasets, scarce datasets, and anomaly detection. - -Researchers can freely share this synthetic data and collaborate on modeling without revealing private medical information. Well-constructed synthetic data protects Privacy while providing utility for developing accurate models. Key techniques to prevent reconstructing the original data include adding differential privacy noise during training, enforcing plausibility constraints, and using multiple diverse generative models. Here are some common approaches for generating synthetic data: - -* **Generative Adversarial Networks (GANs):** GANs are an AI algorithm used in unsupervised learning where two neural networks compete against each other in a game. @fig-gans is an overview of the GAN system. The generator network (big red box) is responsible for producing the synthetic data, and the discriminator network (yellow box) evaluates the authenticity of the data by distinguishing between fake data created by the generator network and the real data. The generator and discriminator networks learn and update their parameters based on the results. The discriminator acts as a metric on how similar the fake and real data are to one another. It is highly effective at generating realistic data and is a popular approach for generating synthetic data. - -![Flowchart of GANs. Source: @rosa2021.](images/png/Flowchart_of_GANs.png){#fig-gans} - -* **Variational Autoencoders (VAEs):** VAEs are neural networks capable of learning complex probability distributions and balancing data generation quality and computational efficiency. They encode data into a latent space where they learn the distribution to decode the data back. +Synthetic data generation has emerged as an important privacy-preserving machine learning approach that allows models to be developed and tested without exposing real user data. The key idea is to train generative models on real-world datasets and then sample from these models to synthesize artificial data that statistically matches the original data distribution but does not contain actual user information. For instance, techniques like GANs, VAEs, and data augmentation can be used to produce synthetic data that mimics real datasets while preserving privacy. Simulations are also commonly employed in scenarios where synthetic data must represent complex systems, such as in scientific research or urban planning. -* **Data Augmentation:** This involves transforming existing data to create new, altered data. For example, flipping, rotating, and scaling (uniformly or non-uniformly) original images can help create a more diverse, robust image dataset before training an ML model. +The primary challenge of synthesizing data is to ensure adversaries cannot re-identify the original dataset. A simple approach to achieving synthetic data is adding noise to the original dataset, which still risks privacy leakage. When noise is added to data in the context of differential privacy, sophisticated mechanisms based on the data's sensitivity are used to calibrate the amount and distribution of noise. Through these mathematically rigorous frameworks, differential privacy generally guarantees privacy at some level, which is the primary goal of this technique. Beyond preserving privacy, synthetic data combats multiple data availability issues such as imbalanced datasets, scarce datasets, and anomaly detection. -* **Simulations:** Mathematical models can simulate real-world systems or processes to mimic real-world phenomena. This is highly useful in scientific research, urban planning, and economics. +Researchers can freely share this synthetic data and collaborate on modeling without revealing private medical information. Well-constructed synthetic data protects privacy while providing utility for developing accurate models. Key techniques to prevent reconstructing the original data include adding differential privacy noise during training, enforcing plausibility constraints, and using multiple diverse generative models. #### Benefits