updates

rajp152k · Jun 17, 2024 · bb599d6 · bb599d6
1 parent d703334
commit bb599d6
Show file tree

Hide file tree

Showing 5 changed files with 58 additions and 7 deletions.
diff --git a/Content/20240205171209-go.org b/Content/20240205171209-go.org
@@ -13,3 +13,6 @@
  - starting out with go to get into cloud native applications and rewriting a product
 * Resources
 ** BOOK: building an orchestrator in golang
+:PROPERTIES:
+:ID:       3af62b5f-3c13-40c8-a912-18a94b7cb175
+:END:
diff --git a/Content/20240501195739-scheduling_algorithms.org b/Content/20240501195739-scheduling_algorithms.org
@@ -8,11 +8,11 @@ In computing, scheduling is the method by which threads, processes or data flows
 
 The scheduler is concerned mainly with:
 
-    throughput (total amount of work done per time unit);
-    turnaround time (between submission and completion);
-    response time (between submission and start);
-    waiting time (between job readiness and execution);
-    fairness (appropriate times according to priorities).
+ - throughput (total amount of work done per time unit);
+ - turnaround time (between submission and completion);
+ - response time (between submission and start);
+ - waiting time (between job readiness and execution);
+ - fairness (appropriate times according to priorities).
 
 In practice, these goals often conflict.
 

diff --git a/Content/20240508163913-orchestration.org b/Content/20240508163913-orchestration.org
@@ -8,10 +8,57 @@ See [[id:d4627a77-fafc-4c76-91a2-59a84e42de71][Container]]
 * Abstract
 - provisioning automation for deployment, scaling and management of resources.
 - said resources may be defined by conceptual combinations of compute, storage and networking resources.
+* Overview
+An orchestrator's interpretation may be enabled via following generic terminology:
+** Task
+- smallest unit of work, usually run in a container.
+- defined by a set of specifications:
+  - resource requirements (compute, storage (disk, memory))
+  - failure policy (restart, etc)
+  - metadata (names, tags, book keeping stuff)
+** Job
+- An aggregation of tasks
+- specifies details at a higher level than a task
+  - tasks that make up a job
+  - which data centers should run it
+  - number of replications
+  - failure policy and further types
+** Scheduler
+- schedules tasks in a job
+- decision making component
+  - checkout [[id:7f960631-c727-41b8-80c2-3ccaa4ae4ba2][scheduling algorithms]]
+  - score candidates based on job compatibility
+  - allocate job to the best machine at the moment
+** Manager
+- control plane : brains
+- orchestrates the working of all components discussed here
+- global metrics, logging, tracking etc
+- a basic manager should be able to:
+  - accept user requests to start/stop tasks
+  - schedule tasks onto workers
+  - track tasks, their states, and their address
+** Worker
+- runs tasks assigned to it
+- local metrics, tracking, logging, health for manager's log collation
+- supply local task stats and meta-data for manager's meta-data collation
+** Cluster
+- logical grouping of all the components above
+- towards [[id:20240519T162542.805560][High Availability]] and [[id:0d7c2dea-a250-4380-b826-ad4d2547d8d6][scalability]]
+** Interface (CLI atleast)
+- query and command the orchestrator
+- the internal interface for how the above interact can also be included in the design spec of this component
+- see [[id:20240101T073142.439145][Application Programming Interface]]
 * Instances
 ** [[id:27a4d68c-adef-42aa-a4b4-b44b3f10395d][Apache Mesos]]
 ** [[id:c2072565-787a-4cea-9894-60fad254f61d][Kubernetes]]
+** Hashicorp Nomad
+** Google Borg
+- https://research.google/pubs/large-scale-cluster-management-at-google-with-borg/
 * Resources
  - https://en.wikipedia.org/wiki/Orchestration_(computing)
 
+ - [[id:3af62b5f-3c13-40c8-a912-18a94b7cb175][BOOK: building an orchestrator in golang]]
 
+ - https://ieeexplore.ieee.org/document/4291052
+
+ - https://www.researchgate.net/figure/The-deployment-of-an-ePVM-application_fig3_262408650
diff --git a/Content/20240519162542-fault_tolerence.org b/Content/20240519162542-fault_tolerence.org
@@ -17,7 +17,6 @@ Fault tolerance is essential for critical systems where downtime or data loss ca
   - [[id:a9430614-4e6e-41ff-9788-0f51c2867e74][Hardware]]: Server crashes, disk failures, power outages.
   - [[id:d9a3aabe-114b-43c6-81f9-ca6e01ed3f46][Software]]: Bugs, crashes, security vulnerabilities.
   - [[id:a4e712e1-a233-4173-91fa-4e145bd68769][Network]]: Lost connections, congestion, cyberattacks.
-  - niques:
 * Failure Detection
 :PROPERTIES:
 :ID:       20240519T222806.511836
@@ -75,7 +74,6 @@ Fault tolerance is essential for critical systems where downtime or data loss ca
  - High Availability: Ensures critical applications and services remain accessible even with failures.
  - Data Protection: Safeguards against data loss due to hardware or software malfunctions.
  - Improved Performance: Can enhance performance by distributing workload and preventing bottlenecks.
-
 * Instances
 
 ** RAID (Redundant Array of Independent Disks)

diff --git a/Content/index.org b/Content/index.org
@@ -55,6 +55,9 @@ The author intends to utilize this document as a personal knowledge base, emphas
  - a cache for ideas that may never see the light of day again.
    - or they just might...
 * Stream
+** 0x226C
+- setup biblio + citar in doom
+- exploring neurosymbolic AI : [cite:@sheth_neurosymbolic_2023]
 ** 0x2267
  - setup an AI usage disclaimer
 ** 0x2262