diff --git a/Content/20230717135201-data_engineering.org b/Content/20230717135201-data_engineering.org index 8cafe16..8214cf3 100644 --- a/Content/20230717135201-data_engineering.org +++ b/Content/20230717135201-data_engineering.org @@ -4,23 +4,10 @@ #+title: Data Engineering #+filetags: :data: -* Stream -** 0x22F4 - - see [[id:869abfbd-031b-40a0-9c4b-69c3e7d820ab][real-time data streaming]] and [[id:f4135d2f-3390-4d76-b05a-222f910c10d4][batch computing]] -** 0x22F2 - - see [[id:1656ed9e-9ed0-4ddb-9953-98189f6bb42e][Extract, Transform, Load]] - - see [[id:710e11f8-780a-4aa5-84fc-c0ab9bb848c0][Big Data]] -** 0x22F2 - - starting out with setting up a data lake - - reading on the book : fundamentals of data engineering - - will be populating a lot of relevant nodes that demand further exploration - - all tagged as =:data:= - - do have foundational data science experience; intrigued to explore how it scales out operationally. - - will not be able to do and end to end over view of the whole field. - - will be indexing into relevant nodes from the stream sub nodes instead. * Core Nodes ** Data Engineering Lifecycle *** Overview +**** Overall Flow #+begin_src plantuml :file ./images/data-eng-lifecycle.png :exports both @startuml @@ -67,17 +54,77 @@ Serving =right=> Applications - they also need to have side-car processing capabilities to serve complex queries - storage is omnipresent across the cycle from ingestion to serving results and the transformations sandwiched within - streaming frameworks like [[id:fa58feb4-25a2-40f1-8533-cafcb0d3886b][apache kafka]] and [[id:5e438030-0096-4b97-8931-f99eb7b738c5][pulsar]] can simultaneously function as ingestion, storage and query systems for messages -**** Ingestion +**** [[id:5cc98814-915c-4e20-a8e5-82ddd6783466][Ingestion]] **** Transformation + +In the data engineering lifecycle, the transformation process is a critical stage where raw data is converted into a suitable format for analysis and utilization. Here are the key aspects of the transformation process: + +- *Extraction*: + - Raw data is sourced from multiple origins, including databases, external data feeds, sensors, and more. + +- *Data Cleaning*: + - Removing duplicates, correcting errors, and filling in missing values to ensure data quality. + - Standardizing data formats and naming conventions for consistency. + +- *Data Integration*: + - Combining data from different sources to provide a unified view. + - Resolving heterogeneities and conflicts in data schemas. + +- *Data Transformation*: + - Changing data from its original form into a format that is analyzable. This includes: + - *Normalization/Denormalization*: Adjusting the data structure for better access or storage. + - *Aggregation*: Summarizing data to provide insights at a higher level. + - *Enrichment*: Adding new data fields derived from existing data to enhance context. + +- *Filtering*: + - Removing unnecessary or irrelevant data to focus on what's important. + +- *Feature Engineering*: + - Creating new variables or modifying existing ones to improve the performance of models. + +- *Validation*: + - Ensuring that transformed data meets quality and integrity standards. + - Conducting checks against business rules and expectations. + +***** *Connections and Importance*: +- The transformation process is intrinsically connected to subsequent stages of data analytics and machine learning, as the quality and structure of transformed data directly impact the performance of analytics models. +- It ensures that data is suitable for storage in a data warehouse or data lake, where further data exploration can occur. +- By transforming data appropriately, businesses can derive actionable insights that drive strategic decisions. + **** Serving -**** Applications + - [[id:552f0396-488d-43d8-8b44-f68dff74fa5e][Analytics]] + - [[id:49b0dd1e-ca9e-46fa-a0b9-db0ec330833d][MultiTenancy]] + - [[id:20230713T110006.406161][Machine Learning]] + - Reverse [[id:1656ed9e-9ed0-4ddb-9953-98189f6bb42e][ETL]] *** Undercurrents **** [[id:6e9b50dc-c5c0-454d-ad99-e6b6968b221a][Security]] + - Access Control for: + - Data + - Systems + - [[id:d4f81cb7-e01b-4115-b8a1-9a303a82699d][The Principle of Least Privilege]] **** Data Management + - Data Governance + - Discoverability + - Definitions + - Accountability + - Data Modeling + - Data Integrity **** DataOps + - Data Governance + - Observability and Monitoring + - Incident Reporting **** Data Architecture + - Analyse tradeoffs + - Design for agility + - Add value to the business **** [[id:f822f8f6-89eb-4aa8-ac8f-fdcff3f06fb9][Orchestration]] + - Coordinate workflows + - Schedule jobs + - Manage tasks **** [[id:5c2039f5-0c44-4926-b2d7-a8bf471923ac][Software Engineering]] + - Programming and coding skills + - Software Design Patterns + - Testing and Debugging *** [[id:9204583f-13ab-4039-9bfc-453700f8b0d1][The Data Life Cycle]] - The Data engineering lifecycle is a subset of the data life cycle (explored separately) ** [[id:710e11f8-780a-4aa5-84fc-c0ab9bb848c0][Big Data]] @@ -93,4 +140,7 @@ Serving =right=> Applications ** [[id:a34cc866-ec4b-44f5-972f-1c12782f649d][Presto]] * Resources ** Books - - Fundamentals of Data Engineering +*** Fundamentals of Data Engineering +** Articles +*** Data Observability Driven Development + - https://www.kensu.io/blog/a-guide-to-understanding-data-observability-driven-development diff --git a/Content/20230720113957-graphs.org b/Content/20230720113957-graphs.org index bdc32d3..dba0465 100644 --- a/Content/20230720113957-graphs.org +++ b/Content/20230720113957-graphs.org @@ -11,9 +11,6 @@ A mathematical protocal used to represent suitable abstractions using nodes and - Edges : a connection with optional properties between entities * Prominent Variants ** Directed Acyclic Graph -:PROPERTIES: -:ID: d07976cd-5194-484e-82ab-8c55e064eeb1 -:END: - directed edges, no cycles - many applications - check out [[id:78d16b5e-1893-4057-bc22-b2c9a3ca7ed6][Topological Sort]] for a practical application diff --git a/Content/20230720114059-data.org b/Content/20230720114059-data.org index c8ddf5e..38380a3 100644 --- a/Content/20230720114059-data.org +++ b/Content/20230720114059-data.org @@ -10,6 +10,5 @@ - I'm not sure what particular mechanism/algorithm might be an accurate represent of how one performs search over one's [[id:401e1c2b-fc54-4bee-9a38-d084b8904693][Memory (Mind).]] - I personally feel like I'm unconsciously employing a hierarchical index (a [[id:1d703f5b-8b5e-4c82-9393-a2c88294c959][Graph]]) that leads me from a root event to some specifics conveniently but am yet to fully test out this hypothesis and will refrain from academic exploration but proceed with good old greek thought experiments. ** [[id:20230713T110006.406161][Machine Learning]] -** [[id:2f67eca9-5076-4895-828f-de3655444ee2][DataBase]] -** [[id:2cc32697-c4ce-41b8-987a-2a44a09f78c3][MapReduce]] -** [[id:665e997a-5628-4481-902c-47af4ba30336][Logs]] +** [[id:e9d75f9d-f8bf-4125-beb0-8ca34166ce9e][Data Engineering]] +** [[id:552f0396-488d-43d8-8b44-f68dff74fa5e][Analytics]] diff --git a/Content/20240114203847-agent.org b/Content/20240114203847-agent.org index aa2b3c4..1edd0ce 100644 --- a/Content/20240114203847-agent.org +++ b/Content/20240114203847-agent.org @@ -9,3 +9,5 @@ An [[id:20240114T203601.390070][Entity]] with the ability (at-least) to perform Complex versions there-of might delve into the ability to think, comprehend, formulate, make decisions over the basic requirement of being capable of acting. An Agent that only thinks and doesn't act (or isn't capable of consequences) isn't an entity of significant consequence. + +See [[id:a819cd68-91f9-4d67-b40f-fc37324f708b][Agentic AI]] diff --git a/Content/20240220114146-electronic_storage.org b/Content/20240220114146-electronic_storage.org index 381b75b..c5e87e5 100644 --- a/Content/20240220114146-electronic_storage.org +++ b/Content/20240220114146-electronic_storage.org @@ -1,7 +1,61 @@ :PROPERTIES: :ID: 18491388-2dcc-488f-8f33-00582cf0f77e :END: -#+title: Electronic Storage -#+filetags: :electronics:cs: +#+title: Storage +#+filetags: :data:cs: -* Sentinels +* Overview +** *Types of Storage*: + - Primary Storage: Also known as volatile memory or RAM, it is used by computers to temporarily store data that is actively being used or processed. + - Secondary Storage: Refers to non-volatile storage like hard drives (HDDs), solid-state drives (SSDs), and optical discs where data is stored for long-term retention. + - Tertiary Storage: Involves storage systems used for archiving and backup such as tape drives or cloud-based cold storage solutions. + - Quaternary Storage: Rarely used term, sometimes refers to off-site storage systems or lesser-used forms like microforms. + +** *Storage Technologies*: + - Magnetic Storage: Utilizes magnetic media to store data (e.g., HDDs, magnetic tapes). + - Optical Storage: Uses lasers to readwrite data (e.g., CDs, DVDs, Blu-rays). + - Flash Storage: A form of EEPROM, non-volatile storage technology used in SSDs, USB flash drives. + - Cloud Storage: Allows data to be stored and accessed over the internet, offered by providers like AWS, Google Cloud, Azure. + +** *Key Concepts*: + - Volatility: Determines whether storage retains data when power is lost. + - Capacity: Amount of data a storage medium can hold. + - Speed: Access time and data transfer rates of a storage medium. + - Durability: Resistance to physical wear and data deterioration over time. +* Misc +** Understanding Data Access Frequency +*** Temperatures +**** Hot Data + - more than many times per day + - could be several times per second +**** Cold Data + - seldom queried + - often retained for compliance purposes + - backups in cases of catastrophic failures +** Handy Questions to evaluate Storage systems + +These are some questions that help gauge the choices of storage systems when architecting a data solution such as: + - [[id:cfa5fba0-eb2d-4e71-b17a-c646149ab27e][data warehouse]] + - [[id:796b4db7-42dc-4783-bb05-b15524ddf117][data lakehouse]] + - [[id:2f67eca9-5076-4895-828f-de3655444ee2][database]] + - [[id:add20973-54a9-4d96-a938-b27ccbf9c1e6][object storage]] + +** Questions + - Is this storage solution compatible with the architecture's required read and write speeds? + - Will storage create a bottleneck for downstream processes? + - Do you understand how this storage technology works? + - are you using the storage system optimally or commiting unnatural acts? + - for instance: are you applying a high rate of random access updates in an object storage (an antipattern) + - Will this storage system handle anticipated future scale? + - you should consider all capacity limits on the storage system: total available storage, read operation rate, write volume, etc + - Will downstream users and processes be able to retrieve data in the required [[id:079db37b-925c-478a-836f-7f6ce8027108][service level agreement]] + - Are you capturing [[id:5c5245d1-4919-4e13-9232-410f324c0288][metadata]] about the schema evolution, data flows, data lineage and so forth? + - Metadata has a significant impact on the utility of data + - Metadata represents an investment in the future, dramatically enhancing discoverability and institutional knowledge to streamline future projects and architecture changes. + - Is this a pure storage solution (object storage), or does it support complex query patterns (i.e. a cloud data warehouse)? + - Is the storage system schema-agnostic (object storage)? Flexible schema (Cassandra)? Enforced Schema (a cloud data warehouse)? + - How are you tracking master data, golden records data quality, and data lineage for data governance? + - How are you handling regulatory compliance and data sovereignty? For example, can you store your data in certain geographical locations but not others? +* Relevant Nodes + - [[id:e9d75f9d-f8bf-4125-beb0-8ca34166ce9e][Data Engineering]] + - [[id:1073cfed-a09d-48b6-bd52-ba09708699bf][Message Brokers]] diff --git a/Content/20240807085439-compliance.org b/Content/20240807085439-compliance.org index a4e8145..0ab239b 100644 --- a/Content/20240807085439-compliance.org +++ b/Content/20240807085439-compliance.org @@ -1,5 +1,5 @@ :PROPERTIES: :ID: 06cb8fe6-cf1e-4c0c-afdc-f16ab38414ef :END: -#+title: compliance +#+title: Compliance #+filetags: :bs: diff --git a/Content/20241029124743-etl.org b/Content/20241029124743-etl.org index ec23e0f..0f4ba0f 100644 --- a/Content/20241029124743-etl.org +++ b/Content/20241029124743-etl.org @@ -24,5 +24,35 @@ - Helps organizations in making informed business decisions through data insights. - Enables better data integration across disparate data sources. -* Relevant Nodes -** [[id:015cb100-bd71-4e98-ae7f-03d547b048e5][ELT]] (Extract, Load, Transform) +* Reverse ETL +Reverse ETL is a concept within data management and analytics, specifically within the broader context of data integration and transformation processes. + +** *Definition*: + - Reverse ETL refers to the process of moving data from a centralized data warehouse or data lake back to operational systems (like CRM, marketing tools, or sales platforms) to make it actionable for various business operations. + +** *Components Involved*: + - *ETL Process*: Extract, Transform, Load (ETL) traditionally involves moving data from operational systems into a [[id:cfa5fba0-eb2d-4e71-b17a-c646149ab27e][data warehouse]] for analysis. + - *Reverse Process*: Reverse ETL involves taking insights or aggregated data from the data warehouse and pushing it back into operational tools for real-time business use. + +** *Purpose*: + - Operationalize data insights, allowing teams to act based on centralized data analysis directly within their tools. + - Enhance decision-making with enriched data that is more comprehensively processed within data warehouses. + +** *Technologies & Tools*: + - Tools like Census, Hightouch, and Grouparoo specifically cater to reverse ETL functions, enabling data movement back into operational systems. + - These tools integrate with data warehouses such as Snowflake, BigQuery, or Redshift. + +** *Use Cases*: + - Enabling marketing automation systems with enhanced customer insights. + - Sending consolidated sales information to CRM systems for better customer interaction. + - Real-time reporting and alert systems by integrating analyzed data back into business operations. + +** *Challenges*: + - Data Consistency: Ensuring that the data quality and structure remain consistent across data transfer. + - Data Latency: Balancing between real-time data needs and the feasibility of processing and transferring large sets of data swiftly. + - Complexity and Maintenance: Managing the transformations and keeping the system up-to-date with data warehouse changes. + +** *Connections*: +- *Data Warehousing*: Fundamental to Reverse ETL, as it acts as the central repository from which data is drawn. +- *ETL and [[id:015cb100-bd71-4e98-ae7f-03d547b048e5][ELT]] Processes*: Provides a framework necessary for data preparation before reverse ETL can occur. +- *Data Governance*: Crucial in maintaining the integrity and security of the data as it moves to and from different platforms. diff --git a/Content/20241101175831-cache.org b/Content/20241101175831-cache.org index aa9ae1a..438beb1 100644 --- a/Content/20241101175831-cache.org +++ b/Content/20241101175831-cache.org @@ -4,6 +4,3 @@ #+title: Cache #+filetags: :data: - - high speed memory taking advantage of the temporal locality of reference principle -> recenlty accessed data is likely to be accessed again. - - - caches are a good first step towards improving a [[id:2f67eca9-5076-4895-828f-de3655444ee2][DataBase's]] performance under multiple accesses. diff --git a/Content/20241101181744-data_lakehouse.org b/Content/20241101181744-data_lakehouse.org new file mode 100644 index 0000000..649d215 --- /dev/null +++ b/Content/20241101181744-data_lakehouse.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 796b4db7-42dc-4783-bb05-b15524ddf117 +:END: +#+title: Data Lakehouse +#+filetags: :data: diff --git a/Content/20241101181803-object_storage.org b/Content/20241101181803-object_storage.org new file mode 100644 index 0000000..55d8f41 --- /dev/null +++ b/Content/20241101181803-object_storage.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: add20973-54a9-4d96-a938-b27ccbf9c1e6 +:END: +#+title: Object Storage +#+filetags: :data: diff --git a/Content/20241101182156-service_level_agreement.org b/Content/20241101182156-service_level_agreement.org new file mode 100644 index 0000000..29bfeb6 --- /dev/null +++ b/Content/20241101182156-service_level_agreement.org @@ -0,0 +1,6 @@ +:PROPERTIES: +:ID: 079db37b-925c-478a-836f-7f6ce8027108 +:ROAM_ALIASES: SLA +:END: +#+title: Service Level Agreement +#+filetags: :bs: diff --git a/Content/20241101182746-metadata.org b/Content/20241101182746-metadata.org new file mode 100644 index 0000000..d87d871 --- /dev/null +++ b/Content/20241101182746-metadata.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 5c5245d1-4919-4e13-9232-410f324c0288 +:END: +#+title: MetaData +#+filetags: :data:meta: diff --git a/Content/20241101190815-ingestion.org b/Content/20241101190815-ingestion.org new file mode 100644 index 0000000..83c2520 --- /dev/null +++ b/Content/20241101190815-ingestion.org @@ -0,0 +1,84 @@ +:PROPERTIES: +:ID: 5cc98814-915c-4e20-a8e5-82ddd6783466 +:END: +#+title: Data Ingestion +#+filetags: :cs:data: + +* Overview +** *Definition*: +Data ingestion refers to the process of transporting data from various sources to a storage medium where it can be accessed, used, and analyzed. + +** *Sources of Data*: + - Data can be ingested from diverse sources such as [[id:2f67eca9-5076-4895-828f-de3655444ee2][databases]], cloud [[id:18491388-2dcc-488f-8f33-00582cf0f77e][storage]], [[id:20240101T073142.439145][APIs]], [[id:b8f679c7-3ac1-48d7-b1b5-8e4743a62767][IoT]] devices, and social media platforms. + +** *Data Formats*: + - Data ingestion tools must handle multiple data formats, including structured, unstructured, and semi-structured data formats like JSON, CSV, XML, Avro, Parquet, etc. + +** Types +*** *Temporality*: +**** Batch Processing +- Definition: Data is collected over a period, then processed as a single unit or batch. +- Latency: Typically associated with high latency, as it waits for a complete dataset before processing. +- Use Cases: Ideal for scenarios where up-to-date data is not crucial, such as end-of-day reporting, ETL processes, and periodic data integrations. +- Scalability: Generally scalable for large volumes of data, since processing can be done in bulk. +- Complexity: Simpler to implement in comparison to streaming, often utilizing traditional databases and data warehouses. + +**** Streaming Processing +- Definition: Data is processed in real-time or near-real-time as it arrives. +- Latency: Low latency, providing immediate or timely processing of information. +- Use Cases: Suited for applications requiring instant data processing like fraud detection, live event monitoring, and online recommendation systems. +- Scalability: Can handle continuous data flows which may require distributed processing systems to scale effectively. +- Complexity: More complex to implement due to the requirement of managing data flow, consistency, and processing order. + +**** Connections and Considerations: +- Data Volume & Velocity: Batch is preferable for high-volume, less frequent transactions, whereas streaming better handles continuous flows of data. +- Data Consistency & Accuracy: Consider how eventual consistency or exactly-once semantics would impact your application; these are more challenging to guarantee in streaming systems. +- Infrastructure & Cost: Streaming might require more sophisticated and potentially costly infrastructure to maintain low latency. +- Business Needs: Analyze whether the nature of your business operations aligns more closely with periodic updates or ongoing, real-time data insights. +*** *Mechanisms* +**** Push + - Definition: Data is sent from the source to the destination proactively. + - Use Cases: Suitable for real-time or near-real-time data applications. + - Advantages: + - Lower latency since data is sent as soon as it's available. + - Simplicity for the source as it only needs to send data to the target. + - Disadvantages: + - More complex error handling required by the destination to manage unexpected data arrival. + - Potentially more challenging to scale if the source needs to send data to multiple destinations. +**** Pull + - Definition: The destination requests and retrieves data from the source. + - Use Cases: Ideal for periodic batch data processing. + - Advantages: + - The destination controls the rate and timing of data retrieval, simplifying error management and processing. + - Easier to manage retries and failed data retrievals. + - Disadvantages: + - Higher latency, as data is retrieved based on the destination's schedule. + - Increased complexity on the destination side, as it must implement scheduling and data checking mechanisms. +**** Connections and Considerations +- Latency: Push systems generally have lower latency than pull systems since they send data immediately upon availability. +- Scalability: Pull systems might offer better scalability if multiple consumers are polling from the same source, while push systems can become complex if the source pushes data to many destinations. +- Resource Management: Push systems require proactive resource management by the source, while pull systems require it by the destination. +- Error Handling: Pull systems often have built-in mechanisms to handle intermittent retrieval failures, while push systems require robust error-handling frameworks at the +** *Challenges in Data Ingestion*: + - Scalability: Managing increasing volumes of data efficiently. + - Data Quality: Ensuring the accuracy and consistency of data being ingested. + - Latency: Minimizing delays from data source to destination. + - Security: Protecting data during ingestion from unauthorized access or corruption. + +** *Best Practices*: + - Ensuring data quality and cleansing before ingestion. + - Implementing robust error handling mechanisms. + - Using scalable solutions that can adapt to growing data inflows. + - Monitoring the ingestion process continuously to detect and fix issues early. + +* Misc +** Key Engineering Considerations for Ingestion Phase +- What are the use cases for the data I'm ingesting? + - Can I use this data rather than creating multiple version of the same dataset? +- Are the systems generating and ingesting this data reliably, and is the data available when I need it? +- What is the data destination after ingestion? +- How frequently will I need to access the data? +- In what volume will the data typically arrive? +- What format is the data in? Can my downstream storage and transformation systems handle this format? +- Is the source data in good shape for immediate downstream use? If so, for how long, and what may cause it to be unusable? +- If the data is from a streaming source, does it need to be transformed before reaching its destination? Would an in-flight transformation be appropriate, where the data is transformed within the stream itself? diff --git a/Content/20241102075313-agentic_ai.org b/Content/20241102075313-agentic_ai.org new file mode 100644 index 0000000..2425be9 --- /dev/null +++ b/Content/20241102075313-agentic_ai.org @@ -0,0 +1,21 @@ +:PROPERTIES: +:ID: a819cd68-91f9-4d67-b40f-fc37324f708b +:END: +#+title: Agentic AI +#+filetags: :ai: + +* Overview + +- *Definition of Agentic AI*: + - An AI system designed to make decisions and take actions autonomously. + - Operates with a level of agency that allows it to pursue specified goals. + +- *Characteristics*: + - Autonomy: The ability to operate without human intervention. + - Goal-Oriented: Designed to achieve specific objectives. + - Adaptivity: Capable of learning and adjusting strategies based on new information. + +- *Applications*: + - Robotics: Autonomous robots that perform tasks such as exploration or maintenance. + - Virtual Assistants: Systems like Siri or Alexa that can act based on user inputs. + - Autonomous Vehicles: Self-driving cars with decision-making capabilities. diff --git a/Content/20241102080944-analytics.org b/Content/20241102080944-analytics.org new file mode 100644 index 0000000..c02bcf8 --- /dev/null +++ b/Content/20241102080944-analytics.org @@ -0,0 +1,55 @@ +:PROPERTIES: +:ID: 552f0396-488d-43d8-8b44-f68dff74fa5e +:END: +#+title: Analytics +#+filetags: :data: + +* Overview +** *Definition and Scope* + - Analytics refers to the systematic computational analysis of [[id:d45dae92-5148-4220-b8dd-e4da80674053][data]] or [[id:ed67b732-55bc-40a5-97d8-b9d16311e959][statistics]]. + - It is used for the discovery, interpretation, and communication of meaningful patterns in data. + +** *Types of Analytics* +*** Fundamental + - *Descriptive Analytics*: Analyzes historical data to understand changes over time. + - *Predictive Analytics*: Uses statistical models and forecasts techniques to understand the future. + - *Prescriptive Analytics*: Suggests actions you can take to affect desired outcomes. +*** Application based +**** *Business Intelligence (BI):* + - *Purpose:* Involves the use of data analysis tools to provide historical, current, and predictive views of business operations. + - *Components:* + - Data mining + - Process analysis + - Performance benchmarking + - Descriptive analytics + - *Outcomes:* Helps in making informed business decisions by highlighting trends and insights through dashboards and reports. + +**** *Operational Analytics:* + - *Purpose:* Focuses on monitoring and analyzing current business operations to improve efficiency and effectiveness on an ongoing basis. + - *Components:* + - Real-time data processing + - Forecasting for operational efficiency + - Optimization of business processes + - *Outcomes:* Provides actionable insights to refine and optimize internal operations, enhancing productivity and minimizing waste. + +**** *Embedded Analytics:* + - *Purpose:* Integrates analytic capabilities directly into existing applications and business processes. + - *Components:* + - Seamless integration into software applications + - Accessible analytics within operational workflows + - Enhanced user engagement through context-specific insights + - *Outcomes:* Offers users the ability to interact with analytics in real-time within the context of their daily tasks, driving informed decision-making as part of their standard operations. + +**** *Connections:* + - *BI vs. Operational Analytics:* While BI is more about strategic insights and long-term decision-making, operational analytics zeros in on the day-to-day workings and is more tactical. + - *Operational vs. Embedded Analytics:* Both deal with real-time data, but embedded analytics specifically focuses on integrating insights directly into the operational tools users are already employing. + - *BI vs. Embedded Analytics:* BI typically stands alone as a platform that requires user interaction to analyze data, whereas embedded analytics brings the insights into the tools and systems the user already interacts with daily. + +** *Challenges* + - Data Quality: Ensuring the accuracy and completeness of data. + - Data Integration: Combining data from different sources. + - Privacy and Security: Protecting sensitive data from breaches and misuse. + +** *Connections:* +- The type of analytics used often correlates with business objectives, such as forecasting demand (predictive) or improving operational efficiency (prescriptive). +- Tools and technologies are selected based on the data complexity, volume, and specific use cases within an organization. diff --git a/Content/20241102081419-multitenancy.org b/Content/20241102081419-multitenancy.org new file mode 100644 index 0000000..f298967 --- /dev/null +++ b/Content/20241102081419-multitenancy.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 49b0dd1e-ca9e-46fa-a0b9-db0ec330833d +:END: +#+title: MultiTenancy +#+filetags: :meta: diff --git a/Content/20241102081632-statistics.org b/Content/20241102081632-statistics.org new file mode 100644 index 0000000..e33c2b9 --- /dev/null +++ b/Content/20241102081632-statistics.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: ed67b732-55bc-40a5-97d8-b9d16311e959 +:END: +#+title: Statistics +#+filetags: :math: diff --git a/Content/20241102085344-the_principle_of_least_privilege.org b/Content/20241102085344-the_principle_of_least_privilege.org new file mode 100644 index 0000000..69bff3e --- /dev/null +++ b/Content/20241102085344-the_principle_of_least_privilege.org @@ -0,0 +1,30 @@ +:PROPERTIES: +:ID: d4f81cb7-e01b-4115-b8a1-9a303a82699d +:ROAM_ALIASES: PoLP +:END: +#+title: The Principle of Least Privilege +#+filetags: :sec: + +* *Definition*: + - The Principle of Least Privilege (PoLP) entails granting users, applications, or systems the minimum levels of access—or permissions—necessary to perform their functions. + - PoLP is foundational in cybersecurity, aiming to reduce the risk of accidental or malicious damage. + +* *Applications*: + - *[[id:aba08b45-c41d-4bb4-9053-bc6dd8704444][Operating Systems]]*: Use [[id:16d3b9b3-2f2a-47ef-81bf-5e045482a26f][role-based access controls]] (RBAC) to assign minimum permissions. + - *[[id:2f67eca9-5076-4895-828f-de3655444ee2][Database]] Management*: Limit access rights to essential data entities. + - *[[id:a4e712e1-a233-4173-91fa-4e145bd68769][Network]] Security*: Restrict user access at the network level (e.g., through [[id:49fee858-eb36-4230-8eb0-881df964aec8][firewalls]]). + +* *Benefits*: + - *Security*: Minimizes [[id:f0485935-d6fc-4bfa-a933-c14fd2a35da7][attack surfaces]] for potential security breaches. + - *[[id:6bef65b1-60da-4cc8-88bf-ee83366fa73d][Damage Control]]*: Limits the extent of damage in case of system compromise. + - *[[id:06cb8fe6-cf1e-4c0c-afdc-f16ab38414ef][Compliance]]*: Assists in meeting regulatory requirements related to data protection. + +* *Challenges*: + - *Complexity*: Managing permissions can become complex in large organizations. + - *Usability*: Over-restriction can hinder productivity if not managed wisely. + - *Dynamic Environments*: Continuously changing user roles require constant updates to permissions. + +* *Connections*: + - PoLP is related to *[[id:6e558dab-3173-4fab-92b7-1a339719b280][Zero Trust Architecture]]*, both emphasizing stringent access controls. + - It complements *[[id:c35c153d-e26b-4f73-8a8d-f960f615c7a7][Defense in Depth]]*, providing layered security by controlling access at various levels. + - *[[id:4e1d433c-9f6b-46c7-ad06-4f8bf798785e][Identity and Access Management]] (IAM)* systems often implement PoLP as part of broader security strategies. diff --git a/Content/20241102085441-zero_trust_architecture.org b/Content/20241102085441-zero_trust_architecture.org new file mode 100644 index 0000000..005ff11 --- /dev/null +++ b/Content/20241102085441-zero_trust_architecture.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 6e558dab-3173-4fab-92b7-1a339719b280 +:END: +#+title: Zero Trust Architecture +#+filetags: :sec: diff --git a/Content/20241102085455-defense_in_depth.org b/Content/20241102085455-defense_in_depth.org new file mode 100644 index 0000000..1cf9b81 --- /dev/null +++ b/Content/20241102085455-defense_in_depth.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: c35c153d-e26b-4f73-8a8d-f960f615c7a7 +:END: +#+title: Defense in Depth +#+filetags: :sec: diff --git a/Content/20241102085521-identity_and_access_management.org b/Content/20241102085521-identity_and_access_management.org new file mode 100644 index 0000000..203ddea --- /dev/null +++ b/Content/20241102085521-identity_and_access_management.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 4e1d433c-9f6b-46c7-ad06-4f8bf798785e +:END: +#+title: Identity and Access Management +#+filetags: :sec: diff --git a/Content/20241102085602-role_based_access_controls.org b/Content/20241102085602-role_based_access_controls.org new file mode 100644 index 0000000..e821ff1 --- /dev/null +++ b/Content/20241102085602-role_based_access_controls.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 16d3b9b3-2f2a-47ef-81bf-5e045482a26f +:END: +#+title: Role-Based Access Controls +#+filetags: :sec: diff --git a/Content/20241102085707-attack_surface.org b/Content/20241102085707-attack_surface.org new file mode 100644 index 0000000..1564ec8 --- /dev/null +++ b/Content/20241102085707-attack_surface.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: f0485935-d6fc-4bfa-a933-c14fd2a35da7 +:END: +#+title: attack surface +#+filetags: :sec: diff --git a/Content/20241102085718-damage_control.org b/Content/20241102085718-damage_control.org new file mode 100644 index 0000000..f64bd42 --- /dev/null +++ b/Content/20241102085718-damage_control.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 6bef65b1-60da-4cc8-88bf-ee83366fa73d +:END: +#+title: Damage Control +#+filetags: :meta: diff --git a/Content/20241102094325-direct_acyclic_graph.org b/Content/20241102094325-direct_acyclic_graph.org new file mode 100644 index 0000000..e35a65c --- /dev/null +++ b/Content/20241102094325-direct_acyclic_graph.org @@ -0,0 +1,6 @@ +:PROPERTIES: +:ID: d07976cd-5194-484e-82ab-8c55e064eeb1 +:ROAM_ALIASES: DAG +:END: +#+title: Directed Acyclic Graph +#+filetags: :math: