diff --git a/Content/20230717135201-data_engineering.org b/Content/20230717135201-data_engineering.org index bdf84a5..8cafe16 100644 --- a/Content/20230717135201-data_engineering.org +++ b/Content/20230717135201-data_engineering.org @@ -20,6 +20,7 @@ - will be indexing into relevant nodes from the stream sub nodes instead. * Core Nodes ** Data Engineering Lifecycle +*** Overview #+begin_src plantuml :file ./images/data-eng-lifecycle.png :exports both @startuml @@ -47,16 +48,38 @@ frame Applications { Serving =right=> Applications @enduml #+end_src + #+RESULTS: [[file:./images/data-eng-lifecycle.png]] -** Undercurrents -*** [[id:6e9b50dc-c5c0-454d-ad99-e6b6968b221a][Security]] -*** Data Management -*** DataOps -*** Data Architecture -*** [[id:f822f8f6-89eb-4aa8-ac8f-fdcff3f06fb9][Orchestration]] -*** [[id:5c2039f5-0c44-4926-b2d7-a8bf471923ac][Software Engineering]] +**** Generation + - source systems : origins of data in the lifecycle + - possibilities: + - [[id:b8f679c7-3ac1-48d7-b1b5-8e4743a62767][IoT]] device + - application [[id:1073cfed-a09d-48b6-bd52-ba09708699bf][message queue]] + - transactional [[id:2f67eca9-5076-4895-828f-de3655444ee2][database]] + - the data engineer consumes from the source systems but doesn't own them + - practical examples: + - application database + - IoT swarms +**** [[id:18491388-2dcc-488f-8f33-00582cf0f77e][Storage]] +- data architectures leverage several storage solutions for all kinds of flows, stores and transitions +- they also need to have side-car processing capabilities to serve complex queries +- storage is omnipresent across the cycle from ingestion to serving results and the transformations sandwiched within +- streaming frameworks like [[id:fa58feb4-25a2-40f1-8533-cafcb0d3886b][apache kafka]] and [[id:5e438030-0096-4b97-8931-f99eb7b738c5][pulsar]] can simultaneously function as ingestion, storage and query systems for messages +**** Ingestion +**** Transformation +**** Serving +**** Applications +*** Undercurrents +**** [[id:6e9b50dc-c5c0-454d-ad99-e6b6968b221a][Security]] +**** Data Management +**** DataOps +**** Data Architecture +**** [[id:f822f8f6-89eb-4aa8-ac8f-fdcff3f06fb9][Orchestration]] +**** [[id:5c2039f5-0c44-4926-b2d7-a8bf471923ac][Software Engineering]] +*** [[id:9204583f-13ab-4039-9bfc-453700f8b0d1][The Data Life Cycle]] + - The Data engineering lifecycle is a subset of the data life cycle (explored separately) ** [[id:710e11f8-780a-4aa5-84fc-c0ab9bb848c0][Big Data]] * Tooling ** [[id:7aa94354-25d9-441b-993f-31ccc970edd3][Hadoop]] diff --git a/Content/20231030092756-robotics.org b/Content/20231030092756-robotics.org index 5e12d47..b31567b 100644 --- a/Content/20231030092756-robotics.org +++ b/Content/20231030092756-robotics.org @@ -2,5 +2,4 @@ :ID: f1ec552e-a7c4-47ae-9dd2-a23733d1da92 :END: #+title: Robotics -#+filetags: :tbp: - +#+filetags: :electronics: diff --git a/Content/20231227162344-computer_networks.org b/Content/20231227162344-computer_networks.org index 3859e71..acbc0aa 100644 --- a/Content/20231227162344-computer_networks.org +++ b/Content/20231227162344-computer_networks.org @@ -12,6 +12,8 @@ I'll be exploring networks in a whimsical manner across several domains and will Also initializing a stream for a lot of the other domains that I explore will be rooted to this node in an unstructered manner. * Stream +** 0x22F5 + - exploring swarm networks ** 0x22EC - speed reading ** 0x22EB diff --git a/Content/20240220114146-electronic_storage.org b/Content/20240220114146-electronic_storage.org index 398dbcd..381b75b 100644 --- a/Content/20240220114146-electronic_storage.org +++ b/Content/20240220114146-electronic_storage.org @@ -4,27 +4,4 @@ #+title: Electronic Storage #+filetags: :electronics:cs: -Exploring this from the ground up - via electrons (Physics) to Logic Gates (Compute Science). - -Will be unstructured as I'll be visiting this node from different perspectives over time. - -* Misc Technical - -- Some degrees of freedom in the context that enable variations in the physical realization of storage device are:- - 1. Speed of access - 2. The underlying Physics of storage - 3. Persistence of the data - -For instance, speaking about two distinct instances: - - [[id:24f37c35-4292-437b-b814-864251f1e44f][qubits]] (quantum information theory) - - the smoothened binary notion of day and night at the equator based on the position of the sun. - - * Sentinels -** Cache -:PROPERTIES: -:ID: c8a3e246-0f29-4909-ab48-0d34802451d5 -:END: - - high speed memory taking advantage of the temporal locality of reference principle -> recenlty accessed data is likely to be accessed again. - - - caches are a good first step towards improving a [[id:2f67eca9-5076-4895-828f-de3655444ee2][DataBase's]] performance under multiple accesses. diff --git a/Content/20240717095231-message_brokers.org b/Content/20240717095231-message_brokers.org index 735aa6a..608aa14 100644 --- a/Content/20240717095231-message_brokers.org +++ b/Content/20240717095231-message_brokers.org @@ -1,5 +1,6 @@ :PROPERTIES: :ID: 1073cfed-a09d-48b6-bd52-ba09708699bf +:ROAM_ALIASES: "Message Queue" :END: #+title: Message Brokers #+filetags: :programming:tool:data: diff --git a/Content/20241010100357-biomimicry.org b/Content/20241010100357-biomimicry.org index f0a0669..237e50c 100644 --- a/Content/20241010100357-biomimicry.org +++ b/Content/20241010100357-biomimicry.org @@ -1,5 +1,6 @@ :PROPERTIES: :ID: 2ac1cb5c-fd21-41a7-a30a-d6a2080d973e +:ROAM_ALIASES: bioMimetics :END: #+title: bioMimicry #+filetags: :biology: diff --git a/Content/20241031150229-data_science_hierarchy_of_needs.org b/Content/20241031150229-data_science_hierarchy_of_needs.org index 7606517..659840e 100644 --- a/Content/20241031150229-data_science_hierarchy_of_needs.org +++ b/Content/20241031150229-data_science_hierarchy_of_needs.org @@ -23,9 +23,15 @@ From Upstream (root initiatives) to Downstream (consequent initiatives) ** [[id:a9f08fcf-c62d-40c0-a7fb-53d7f827b5ea][anomaly detection]] ** prepprocessing/preparation * aggregate/label +** Analytics +** Metrics +** Segments +** Aggregates +** Features +** Training data +* learn/optimize ** [[id:85ff1796-5245-4b42-8f97-64b1fc9487e0][A/B testing]] ** Experimentation ** simpler ML algorithms -* learn/optimize ** [[id:db649cb6-047e-426e-8cdc-774586ef30a0][AI]] ** [[id:20230713T110040.814546][Deep Learning]] diff --git a/Content/20241101165524-the_data_life_cycle.org b/Content/20241101165524-the_data_life_cycle.org new file mode 100644 index 0000000..5161073 --- /dev/null +++ b/Content/20241101165524-the_data_life_cycle.org @@ -0,0 +1,5 @@ +:PROPERTIES: +:ID: 9204583f-13ab-4039-9bfc-453700f8b0d1 +:END: +#+title: The Data Life Cycle +#+filetags: :data: diff --git a/Content/20241101165737-iot.org b/Content/20241101165737-iot.org new file mode 100644 index 0000000..39e1d9b --- /dev/null +++ b/Content/20241101165737-iot.org @@ -0,0 +1,40 @@ +:PROPERTIES: +:ID: b8f679c7-3ac1-48d7-b1b5-8e4743a62767 +:END: +#+title: IoT +#+filetags: :iot: + +* Overview +** *Definition and Scope* + - The [[id:24f4040a-7c18-416a-8460-e69280d437bf][Internet]] of Things (IoT) refers to the [[id:a4e712e1-a233-4173-91fa-4e145bd68769][network]] of physical objects embedded with [[id:0bb707ba-24a5-44b3-8e23-45ade88f605c][sensors]], [[id:d9a3aabe-114b-43c6-81f9-ca6e01ed3f46][software]], and other technologies to connect and exchange data with other devices and systems over the internet. + +** *Key Components* + - *Devices and Sensors*: Physical objects (often referred to as 'things') equipped with sensors and actuators. Examples include smart home devices, wearable health monitors, and industrial sensors. + - *Connectivity*: Communication protocols that enable connection and data exchange between IoT devices and systems. These include Wi-Fi, Bluetooth, Zigbee, and cellular networks. + - *Data Processing and Analytics*: Systems that gather, process, and analyze data collected from IoT devices, providing valuable insights and enabling automated responses. + +** *Applications* + - *Smart Home*: Devices for home automation, such as smart thermostats, lighting systems, and security cameras. + - *Wearable Technology*: Wearable devices that monitor health and fitness parameters, like smartwatches and fitness trackers. + - *Industrial IoT (IIoT)*: Implementations in manufacturing, logistics, and supply chain management to improve efficiency and predictive maintenance. + - *Healthcare*: Remote monitoring devices for patient health, improving delivery of care and management of chronic diseases. + - *Smart Cities*: Urban infrastructure using IoT for traffic management, waste management, and environmental monitoring. + +* IoT [[id:cf3fce52-77ad-4d0d-b934-0a87978f4f46][swarms]] +** *Definition and Concepts* + - IoT Swarms refer to groups of interconnected IoT devices working collaboratively to achieve a common goal. These can be compared to biological swarms (like those of bees or birds) where each entity participates in a larger system or function. + - Communication and Coordination: IoT swarms rely heavily on peer-to-peer communication and require sophisticated algorithms to coordinate actions among the devices. + +** Applications and Use Cases + - Environmental Monitoring: Swarms of drones that can autonomously collect data over large areas, providing insights into climate patterns or disaster management. + - Smart Agriculture: Utilizing swarms of IoT devices to automate and optimize farming processes, like watering, seeding, or pest control. + - Search and Rescue: Deploying swarms of drones or robots in search and rescue missions, where they can survey large areas quickly and efficiently. + +** Challenges and Considerations + - Scalability: Ensuring that the system can handle the coordination of potentially thousands of devices without bottlenecks. + - Latency and Responsiveness: Maintaining low-latency communication to ensure timely coordination and response between devices. + - Security: Protecting data integrity and preventing unauthorized access to the swarm network. + +** Connections and Implications + - The concept of IoT swarms connects with concepts from distributed computing, autonomous systems, and machine learning, as these technologies can help manage and optimize swarm operations. + - IoT swarm developments may revolutionize areas such as logistics, disaster response, and environmental conservation through enhanced automation and operational efficiency. diff --git a/Content/20241101170336-swarm_networks.org b/Content/20241101170336-swarm_networks.org new file mode 100644 index 0000000..32f9fd5 --- /dev/null +++ b/Content/20241101170336-swarm_networks.org @@ -0,0 +1,27 @@ +:PROPERTIES: +:ID: cf3fce52-77ad-4d0d-b934-0a87978f4f46 +:END: +#+title: Swarm Networks +#+filetags: :meta: + +* Overview +** *Definition:* + - Swarm Networks involve the collective behavior of decentralized and self-organized systems. Typically, the term is inspired by [[id:2ac1cb5c-fd21-41a7-a30a-d6a2080d973e][biological systems]] such as ant colonies, bird flocking, or fish schooling. + +** *Characteristics:* + - Distributed control without a centralized authority. + - Robustness to errors and failures due to redundancy across the network. + - Scalability allows the network to grow in size without a linear increase in complexity. + +** *Applications:* + - [[id:f1ec552e-a7c4-47ae-9dd2-a23733d1da92][Robotics]]: Swarm robotics utilize multiple robots to achieve tasks collectively that individual units cannot accomplish alone. + - Telecommunications: Network protocols can leverage swarm intelligence for routing and data dissemination. + - Optimization Problems: Algorithms like Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) resolve complex computational problems by simulating swarm behaviors. + +** *Technologies in Use:* + - [[id:b8f679c7-3ac1-48d7-b1b5-8e4743a62767][IoT]] devices often utilize principles of swarm intelligence to manage network traffic effectively. + - Blockchain technology can leverage swarm principles for decentralized consensus mechanisms. + +** *Challenges:* + - Coordination and communication overhead in large-scale networks. + - Security threats due to the decentralized nature and potential for malicious entities to disrupt operations. diff --git a/Content/20241101175831-cache.org b/Content/20241101175831-cache.org new file mode 100644 index 0000000..aa9ae1a --- /dev/null +++ b/Content/20241101175831-cache.org @@ -0,0 +1,9 @@ +:PROPERTIES: +:ID: c8a3e246-0f29-4909-ab48-0d34802451d5 +:END: +#+title: Cache +#+filetags: :data: + + - high speed memory taking advantage of the temporal locality of reference principle -> recenlty accessed data is likely to be accessed again. + + - caches are a good first step towards improving a [[id:2f67eca9-5076-4895-828f-de3655444ee2][DataBase's]] performance under multiple accesses.