Skip to content

Commit

Permalink
0x229F
Browse files Browse the repository at this point in the history
  • Loading branch information
rajp152k committed Aug 7, 2024
1 parent 1c64404 commit 0a3ead3
Show file tree
Hide file tree
Showing 25 changed files with 325 additions and 5 deletions.
6 changes: 4 additions & 2 deletions Content/20231215153109-search.org
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@
#+title: Search
#+filetags: :meta:



* checkout
- https://lunrjs.com/
- https://xapian.org/
- https://xapian.org/docs/omega/overview.html
1 change: 1 addition & 0 deletions Content/20240519152842-cap.org
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
:PROPERTIES:
:ID: 20240519T152842.050227
:ROAM_ALIASES: "Partition Tolerance(CAP)" Availability(CAP) Consistency(CAP)
:END:
#+title: CAP
#+filetags: :cs:
Expand Down
3 changes: 0 additions & 3 deletions Content/20240630182300-ffmpeg.org
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ encode -down-> output
@enduml
#+end_src

#+RESULTS:
[[file:images/ffmpeg-overview.png]]

** Core Features:
- Decoding and Encoding: Supports a wide range of multimedia formats (video, audio, subtitles) for both reading and writing.

Expand Down
128 changes: 128 additions & 0 deletions Content/20240717101541-apache_pulsar.org
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,131 @@
:END:
#+title: Apache Pulsar
#+filetags: :tool:programming:data:

* Basics

An Open-source distributed pub-sub messaging system:

** High Performance & Scalability
*** [[id:2dca77bf-c105-407f-8afc-289716ea79d5][Low Latency]]
Apache Pulsar achieves low latency through a segment-based architecture that separates data writing (Producers to Brokers) from data reading (Consumers from Brokers).
**** Understanding Segmented Architecture of Pulsar
***** Topics are divided into Segments
A topic, which serves as the channel for messages, is divided into smaller segments over time.
***** Brokers for Writing (Producers):

New messages are appended to the current open segment on a specific Broker. This approach facilitates fast write operations, akin to appending data to a log file.

***** Brokers for Reading (Consumers):

Consumers can read from any segment, independent of the ongoing write operations. This separation offers several advantages:

- Parallelism: Multiple consumers can concurrently read from different segments of a topic, thereby increasing overall throughput.

- Scalability: Brokers can be added or removed from the system without affecting the read availability of existing segments.

**** [[id:4ef92e32-360e-4d76-8d8b-f7c42dcd859c][Apache BookKeeper]]

Segment storage is handled by Apache BookKeeper, a system that guarantees data durability and replication. This allows brokers to focus primarily on message routing and delivery.

Analogy: Consider a newspaper printing press. Writers (Producers) continually add news to the latest edition (representing the current segment). Concurrently, readers (Consumers) can access any past or present edition (segments) without mutual interference.

**** Caveats

- Managing segments introduces complexity compared to traditional message queue systems.

- While the segment-based architecture enables low latency, achieving exceptionally low latencies (sub-millisecond range) might necessitate specific configurations and hardware optimizations.

*** Horizontal Scaling
Pulsar separates compute (brokers) and storage (BookKeeper), allowing each to scale independently. This enables handling massive throughput (millions of messages per second) by adding more brokers or bookies.
** Durability & Fault Tolerance
*** Disk Persistence
Messages are durably stored on disk in BookKeeper, ensuring data isn't lost even if brokers restart.
*** Replication
Data is replicated across multiple bookies for fault tolerance. If one bookie fails, others can seamlessly take over.
*** Write-Ahead Logging in [[id:4ef92e32-360e-4d76-8d8b-f7c42dcd859c][Apache BookKeeper]]

Write-ahead logging (WAL) is a fundamental mechanism within Apache BookKeeper, a distributed storage system that underpins Pulsar's message persistence. It ensures data durability by writing all changes to a log file before applying them to the main storage.

**** How it Works

1. Log File: When a new entry (message) is written to BookKeeper, it is first appended to a write-ahead log file, which is stored on disk.

2. In-Memory Cache: The entry is also temporarily cached in memory for fast access.

3. Commit to Main Storage: Once the log entry is successfully written to disk, BookKeeper commits the change to the main storage (typically a file on disk).

4. Durability Guarantee: Even if the BookKeeper server crashes before the data reaches main storage, the log file ensures that the changes can be replayed upon recovery, guaranteeing data consistency.

**** Effects

- Durability: Data is persistent even in the event of system failures.

- Crash Recovery: Upon recovery, BookKeeper can replay the write-ahead log, restoring data to its consistent state.

- Performance: The in-memory cache enables faster data access, while the WAL ensures that data is persisted in the background.

**** Caveats

- Log File Size: The write-ahead log file can grow over time, potentially impacting disk space and performance.

- Disk I/O: The WAL process involves disk I/O operations, which can impact performance if the system is under heavy load.

** Multi-tenancy
*** Tenant Isolation
Pulsar provides logical isolation between tenants (different applications or organizations) for security and resource management.
*** Resource Allocation
Administrators can allocate resources (topics, bandwidth, storage) to specific tenants, ensuring fair usage and predictable performance.
*** Access Control
Granular access control mechanisms restrict tenant access to specific resources, enhancing security and data privacy.

** Understanding [[id:e9973a5d-a0bb-49b5-9767-af6df7a459eb][Geo-Replication]]

Geo-replication in Pulsar ensures data durability and low latency across
geographically distributed systems.

It effectively mirrors data across multiple data centers, ensuring application continuity even if one data center experiences an outage.

*** Working Mechanism
1. Clusters & Namespaces: Pulsar deploys in clusters, each capable of
hosting multiple tenants and namespaces, which logically group
topics.
2. Replication Policies: Policies are defined to replicate specific
namespaces across clusters. These policies determine the number and
location of data copies.
3. Asynchronous Replication: Data is replicated asynchronously from the
origin cluster to remote clusters. This approach ensures low latency
for producers and consumers, even during replication.
4. Automatic Failover (Optional): In the event of an outage, Pulsar can
automatically failover to a healthy cluster, minimizing downtime.
*** Benefits
- Disaster Recovery: Data is protected from regional outages.
- Low Latency: Users connect to the nearest cluster, minimizing data access delays.
- Data Locality: Compliance with data sovereignty regulations is simplified by keeping data within specific geographical boundaries.
** Tiered Storage
Tiered storage in Apache Pulsar optimizes message storage by strategically distributing data across different storage tiers based on access frequency and retention requirements. This approach resembles a filing cabinet with different drawers for different types of documents; frequently used documents are kept in the top drawer, while less frequently accessed ones are in lower drawers.

*** Working Mechanism

1. Primary Storage (Hot Tier): Messages are initially written to the primary storage tier, typically fast storage like SSDs or NVMe drives. This ensures low latency for producers and consumers accessing recently written messages.

2. Secondary Storage (Warm Tier): After a specified period (e.g., 7 days), less frequently accessed messages are moved to a secondary storage tier, usually cheaper and slower storage like HDDs.

3. Archive Storage (Cold Tier): For very old or infrequently accessed messages, Pulsar can move them to an archive storage tier, often located on cloud storage services.

*** Benefits

**** Cost Optimization
Using cheaper storage for less frequently accessed data reduces overall storage costs.

**** Performance Improvement
By keeping frequently accessed data on faster storage, Pulsar maintains low latency for most operations.

**** Scalability
The tiered storage architecture allows Pulsar to handle larger datasets without incurring excessive storage costs.

**** Flexibility
Different retention policies can be applied to different namespaces or topics, providing flexibility in managing data lifecycle.

* Relevant Nodes
** [[id:1073cfed-a09d-48b6-bd52-ba09708699bf][Message Brokers]]
44 changes: 44 additions & 0 deletions Content/20240805105656-apache_bookkeeper.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
:PROPERTIES:
:ID: 4ef92e32-360e-4d76-8d8b-f7c42dcd859c
:END:
#+title: Apache BookKeeper
#+filetags: :tool:compute:

* Basics

- a distributed log storage service.

** Key Features

- provides high durability and [[id:20240519T152842.050227][availability]] for storing logs (write-ahead logs, transaction logs).

- offers a scalable, [[id:20240519T162542.805560][fault-tolerant]], and low-latency storage solution.

- ensures consistent, ordered, and replicated logs.

** Working Mechanism

bookkeeper separates storage and serving roles:

- *bookies:* individual storage nodes (similar to datanodes in [[id:7aa94354-25d9-441b-993f-31ccc970edd3][hadoop]]) responsible for storing log fragments.

- *metadata storage:* tracks bookie locations and log segment metadata ([[id:b635cd13-0e7b-4d3e-aa3e-24ad0c3df768][zookeeper]] is commonly used).

- *clients:* write and read entries to/from bookkeeper (e.g.,[[id:5e438030-0096-4b97-8931-f99eb7b738c5][apache pulsar]]).

** Use Cases

- [[id:a3d0278d-d7b7-47d8-956d-838b79396da7][distributed]] [[id:1073cfed-a09d-48b6-bd52-ba09708699bf][messaging systems]]: providing durable and replicated storage for message streams.

- write-ahead logging: ensuring data consistency and recovery for databases and other systems.

- ledger storage: offering a reliable and performant foundation for distributed ledgers.

** Advantages

- *high throughput and low latency:* designed for high-performance logging operations.
- *scalability:* easily scales horizontally by adding more bookie nodes.
- *durability and availability:* replicated storage ensures data durability and availability.

* Resources
- https://bookkeeper.apache.org/docs/
6 changes: 6 additions & 0 deletions Content/20240805111103-apache_zookeeper.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
:PROPERTIES:
:ID: b635cd13-0e7b-4d3e-aa3e-24ad0c3df768
:END:

#+title: Apache Zookeeper
#+filetags: :compute:tool:
5 changes: 5 additions & 0 deletions Content/20240805111148-hadoop.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 7aa94354-25d9-441b-993f-31ccc970edd3
:END:
#+title: Hadoop
#+filetags: :tool:data:
5 changes: 5 additions & 0 deletions Content/20240805112340-geo_replication.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: e9973a5d-a0bb-49b5-9767-af6df7a459eb
:END:
#+title: Geo-Replication
#+filetags: :compute:data:
39 changes: 39 additions & 0 deletions Content/20240807084318-business.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
:PROPERTIES:
:ID: b5576a88-d12a-4779-958b-03ad4f4c6403
:END:
#+title: Business
#+filetags: :bs:


* Abstract

** Personal
This node set is the pragmatics of business, adapted to my (ever-evolving) incentives.

I have some formal education in Business (took Business Studies during GCSE for two years (age 14-16)) and I was always inclined towards creating value in a sustainable manner, with personal temporal freedom also being a major motivator.

Having recently begun working on [[id:95dd2f7c-e699-4ff6-9f40-52d573527107][theBitMage]], I want to fill in the gaps that the lack of practice has introduced in my approaches, over time.

This node-set is going to be a deep practical analysis of the aspects of business applied to my needs.

I am, as of 0x229F, in the dark when it comes to the ground realities of a operating a business. [[id:ab7c582a-9b00-4e4d-a71c-302efdc1f0e7][The iterative Read-Act-Write (RAW) protocol]] is being applied to this one. The contents encompassed can therefore be considered reliable.

Parallely initializing a fundamental read, I'll be conducting experiments on the aspects of this node set.

** The Etymology

#+begin_quote
Business" originates from the Old English "bisignes", meaning "care, anxiety, occupation", highlighting its association with effort and activity. Over time, it evolved to encompass trade and commerce, emphasizing the transactional nature of engaging in an endeavor for profit.
#+end_quote

** Overview of the Aspects

Business encompasses the [[id:20240114T175025.020370][creation]], [[id:39bb719f-702a-4fb4-9e61-e35e55540bb1][exchange]], and [[id:fd0a58b4-1324-43c7-887f-54c332e1bbbe][delivery]] of [[id:c9942084-31af-424e-bc2b-41800004fa24][value]] to customers, aiming to generate [[id:9d500676-9ae0-48a9-a9c5-f2d87f8ca64c][profit]] and ensure [[id:19e192af-33a5-435f-a4b8-f1e1c7932608][sustainability]]. It involves understanding [[id:0e59bb86-1735-47d0-9373-fed97a835b50][customer]] [[id:9403896e-dcc6-4c02-b492-3f31bb901f54][needs]], developing [[id:85d33895-5734-4df9-97f7-e30a7a0640b2][solutions]], [[id:3242595d-3773-4e24-8e37-b8155a6e9187][managing]] [[id:026cebc6-e388-4e7e-84ce-b46d8f3151a9][operations]], and navigating [[id:f271220c-7cdc-449e-b5e6-90b6583b0fae][market]] dynamics.

Importantly, business [[id:87d0947f-665c-4c88-96bd-86d8cf2fe017][success]] requires [[id:964a55a3-eb75-4ce6-b799-21ae0efed6dd][balancing]] profitability with [[id:6d388392-d79b-4f44-84cf-fdc985b6d144][ethical]] considerations, social impact, and regulatory [[id:06cb8fe6-cf1e-4c0c-afdc-f16ab38414ef][compliance]].

* Resources
** Book : The Personal MBA
:PROPERTIES:
:ID: d9166a1b-cca7-4167-939c-2a2256485e5d
:END:
5 changes: 5 additions & 0 deletions Content/20240807084845-exchange.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 39bb719f-702a-4fb4-9e61-e35e55540bb1
:END:
#+title: Trade
#+filetags: :bs:
6 changes: 6 additions & 0 deletions Content/20240807085001-value.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
:PROPERTIES:
:ID: c9942084-31af-424e-bc2b-41800004fa24
:ROAM_ALIASES: worth
:END:
#+title: Value
#+filetags: :meta:
5 changes: 5 additions & 0 deletions Content/20240807085035-profit.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 9d500676-9ae0-48a9-a9c5-f2d87f8ca64c
:END:
#+title: profit
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085048-sustainability.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 19e192af-33a5-435f-a4b8-f1e1c7932608
:END:
#+title: sustainability
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085104-needs.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 9403896e-dcc6-4c02-b492-3f31bb901f54
:END:
#+title: needs
#+filetags: :meta:
5 changes: 5 additions & 0 deletions Content/20240807085115-customer.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 0e59bb86-1735-47d0-9373-fed97a835b50
:END:
#+title: customer
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085132-solutions.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 85d33895-5734-4df9-97f7-e30a7a0640b2
:END:
#+title: solutions
#+filetags: :meta:
5 changes: 5 additions & 0 deletions Content/20240807085144-operations.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 026cebc6-e388-4e7e-84ce-b46d8f3151a9
:END:
#+title: operations
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085213-management.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 3242595d-3773-4e24-8e37-b8155a6e9187
:END:
#+title: Management
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085238-market.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: f271220c-7cdc-449e-b5e6-90b6583b0fae
:END:
#+title: market
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085312-balance.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 964a55a3-eb75-4ce6-b799-21ae0efed6dd
:END:
#+title: balance
#+filetags: :meta:
5 changes: 5 additions & 0 deletions Content/20240807085405-success.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 87d0947f-665c-4c88-96bd-86d8cf2fe017
:END:
#+title: success
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085439-compliance.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 06cb8fe6-cf1e-4c0c-afdc-f16ab38414ef
:END:
#+title: compliance
#+filetags: :bs:
5 changes: 5 additions & 0 deletions Content/20240807085501-delivery.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: fd0a58b4-1324-43c7-887f-54c332e1bbbe
:END:
#+title: delivery
#+filetags: :meta:
5 changes: 5 additions & 0 deletions Content/20240807090946-thebitmage.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:PROPERTIES:
:ID: 95dd2f7c-e699-4ff6-9f40-52d573527107
:END:
#+title: theBitMage
#+filetags: :bs:
17 changes: 17 additions & 0 deletions Content/20240807092252-the_read_act_write_protocol.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
:PROPERTIES:
:ID: ab7c582a-9b00-4e4d-a71c-302efdc1f0e7
:ROAM_ALIASES: RAW
:END:
#+title: The Read-Act-Write Protocol
#+filetags: :meta:protocol:

I don't know a lot about a lot.
I wish to know a lot about a lot.

This framework has worked for me, when it comes to gaining competency in a domain:
- research sweeps i.e. Read
- conduct informed expriments i.e. Act
- maintain records and consolidate i.e. Write
- iterate until satisfaction

I do apply this only to the domains that I'm seriously considering to be worth the time. In any other case, I roam around freely based on the whims of my whims. So if you've been directed to this node from a node-set somewhere, the implication is that I'm investing seriously into the study of that node-set and the content there-in is more reliable than a non-RAW node set.

0 comments on commit 0a3ead3

Please sign in to comment.