Skip to content

Commit

Permalink
[Docs] Fix Docs Page Show Errors and Update LakeSoul Version (lakesou…
Browse files Browse the repository at this point in the history
…l-io#409)

* fix version and page show errors

Signed-off-by: fphantam <[email protected]>

* fix flink sql usage error

Signed-off-by: fphantam <[email protected]>

---------

Signed-off-by: fphantam <[email protected]>
  • Loading branch information
F-PHantam authored Jan 10, 2024
1 parent 45678d7 commit faa356c
Show file tree
Hide file tree
Showing 6 changed files with 23 additions and 17 deletions.
8 changes: 4 additions & 4 deletions website/docs/01-Getting Started/01-setup-local-env.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ After unpacking spark package, you could find LakeSoul distribution jar from htt
wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.tgz
tar xf spark-3.3.2-bin-hadoop-3.tgz
export SPARK_HOME=${PWD}/spark-3.3.2-bin-hadoop3
wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.4.0/lakesoul-spark-2.4.0-spark-3.3.jar -P $SPARK_HOME/jars
wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.0/lakesoul-spark-2.5.0-spark-3.3.jar -P $SPARK_HOME/jars
```

:::tip
Expand Down Expand Up @@ -89,7 +89,7 @@ spark.sql.catalog.lakesoul | org.apache.spark.sql.lakesoul.catalog.LakeSoulCatal
spark.sql.defaultCatalog | lakesoul | set default catalog for spark

### 1.5 Setup Flink environment
Download LakeSoul Flink jars:https://github.com/lakesoul-io/LakeSoul/releases/download/v2.4.1/lakesoul-flink-2.4.1-flink-1.17.jar
Download LakeSoul Flink jars:https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.0/lakesoul-flink-2.5.0-flink-1.17.jar
Download Flink jars:https://dlcdn.apache.org/flink/flink-1.17.2/flink-1.17.2-bin-scala_2.12.tgz

#### 1.5.1 Start Flink SQL shell
Expand All @@ -98,7 +98,7 @@ Enter the Flink installation directory and execute the following command:
```shell
export lakesoul_home=/opt/soft/pg.property && ./bin/start-cluster.sh

export lakesoul_home=/opt/soft/pg.property && ./bin/sql-client.sh embedded -j lakesoul-flink-2.4.1-flink-1.17.jar
export lakesoul_home=/opt/soft/pg.property && ./bin/sql-client.sh embedded -j lakesoul-flink-2.5.0-flink-1.17.jar
```

#### 1.5.2 Write data to object storage service
Expand Down Expand Up @@ -205,7 +205,7 @@ docker run --net lakesoul-docker-compose-env_default --rm -ti \
-v $(pwd)/lakesoul.properties:/opt/spark/work-dir/lakesoul.properties \
--env lakesoul_home=/opt/spark/work-dir/lakesoul.properties bitnami/spark:3.3.1 \
spark-shell \
--packages com.dmetasoul:lakesoul-spark:2.4.0-spark-3.3 \
--packages com.dmetasoul:lakesoul-spark:2.5.0-spark-3.3 \
--conf spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension \
--conf spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog \
--conf spark.sql.defaultCatalog=lakesoul \
Expand Down
4 changes: 2 additions & 2 deletions website/docs/03-Usage Docs/06-flink-lakesoul-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ LakeSoul supports reading full data of LakeSoul table in batch mode and stream m
```sql
-- Set batch mode, read test_table table
SET execution.runtime-mode = batch;
SELECT * FROM `lakesoul`.`default`.test_table where region='China' and `date`='2023-05-10' order by id;
SELECT * FROM `lakesoul`.`default`.test_table where region='China' and `date`='2023-05-10';
-- Set streaming mode, read test_table table
SET execution.runtime-mode = stream;
Expand All @@ -200,7 +200,7 @@ LakeSoul supports reading full data of LakeSoul table in batch mode and stream m
tEnvs. useCatalog("lakeSoul");
tEnvs.useDatabase("default");
tEnvs.executeSql("SELECT * FROM test_table order by id").print();
tEnvs.executeSql("SELECT * FROM test_table").print();
```

```java
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-without-hadoop.tgz

LakeSoul 发布 jar 包可以从 GitHub Releases 页面下载:https://github.com/lakesoul-io/LakeSoul/releases 。下载后请将 Jar 包放到 Spark 安装目录下的 jars 目录中:
```bash
wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.4.0/lakesoul-spark-2.4.0-spark-3.3.jar -P $SPARK_HOME/jars
wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.0/lakesoul-spark-2.5.0-spark-3.3.jar -P $SPARK_HOME/jars
```

如果访问 Github 有问题,也可以从如下链接下载:https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/lakesoul/lakesoul-spark-2.4.0-spark-3.3.jar
如果访问 Github 有问题,也可以从如下链接下载:https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/lakesoul/lakesoul-spark-2.5.0-spark-3.3.jar

:::tip
从 2.1.0 版本起,LakeSoul 自身的依赖已经通过 shade 方式打包到一个 jar 包中。之前的版本是多个 jar 包以 tar.gz 压缩包的形式发布。
Expand Down Expand Up @@ -91,7 +91,7 @@ spark.sql.catalog.lakesoul | org.apache.spark.sql.lakesoul.catalog.LakeSoulCatal
spark.sql.defaultCatalog | lakesoul

### 1.4 Flink 环境搭建
以当前发布最新版本为例,LakeSoul Flink jar 包下载地址为:https://github.com/lakesoul-io/LakeSoul/releases/download/v2.4.1/lakesoul-flink-2.4.1-flink-1.17.jar
以当前发布最新版本为例,LakeSoul Flink jar 包下载地址为:https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.0/lakesoul-flink-2.5.0-flink-1.17.jar

最新版本支持 flink 集群为1.17,Flink jar下载地址为:https://dlcdn.apache.org/flink/flink-1.17.2/flink-1.17.2-bin-scala_2.12.tgz

Expand All @@ -103,7 +103,7 @@ spark.sql.defaultCatalog | lakesoul
export lakesoul_home=/opt/soft/pg.property && ./bin/start-cluster.sh

# 启动 flink sql client
export lakesoul_home=/opt/soft/pg.property && ./bin/sql-client.sh embedded -j lakesoul-flink-2.4.1-flink-1.17.jar
export lakesoul_home=/opt/soft/pg.property && ./bin/sql-client.sh embedded -j lakesoul-flink-2.5.0-flink-1.17.jar
```

#### 1.4.2 将数据写入对象存储服务
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ LakeSoul | Spark Version
使用`LakeSoulSparkSessionExtension` sql扩展来运行spark-shell/spark-sql。

<Tabs
defaultValue="Scala"
defaultValue="SQL"
values={[
{label: 'Scala', value: 'Scala'},
{label: 'SQL', value: 'SQL'},
]}>
<TabItem value="Spark SQL" label="Spark SQL" default>
<TabItem value="SQL" label="SQL" default>

```bash
spark-sql --conf spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension --conf spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog --conf spark.sql.defaultCatalog=lakesoul --jars lakesoul-spark-2.5.0-spark-3.3.jar
Expand Down Expand Up @@ -110,7 +110,13 @@ LOCATION 'file:/tmp/lakesoul_namespace/lakesoul_table'

```scala
// scala
// First commit will auto-initialize the table
val tablePath= "s3://lakesoul-test-bucket/test_table"
val df = Seq(("2021-01-01",1,"rice"),("2021-01-01",2,"bread")).toDF("date","id","name")
df.write
.mode("append")
.format("lakesoul")
.option("rangePartitions","date")
.save(tablePath)
```
</TabItem>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ LakeSoul Kafka Stream 主要使用 Spark Structured Streaming 来实现数据同

## 1. 准备环境

你可以编译 LakeSoul 项目以获取 LakeSoul Kafka Stream jar, 或者可以通过 https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/lakesoul/https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/lakesoul/lakesoul-kafka-stream-3.3.tar.gz 来获取 LakeSoul Kafka Stream 以及其他任务运行依赖的jar包。
你可以编译 LakeSoul 项目以获取 LakeSoul Kafka Stream jar, 或者可以通过 https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/lakesoul/lakesoul-kafka-stream-3.3.tar.gz 来获取 LakeSoul Kafka Stream 以及其他任务运行依赖的jar包。

下载后解压 tar 包,然后将 jar 包放入 $SPARK_HOME/jars 目录下,或者在提交任务时添加依赖的jar,比如通过 --jars。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ flink.warehouse.dir: "s3://bucket/path"
```
如果指定了 warehouse 路径,则表路径默认为 `warehouse_dir/table_name`。如果建表时在属性中指定了 `path` 属性,则优先使用该属性作为表的存储路径。

Flink 引入 LakeSoul 依赖的方法:下載 lakesoul-flink-2.4.0-flink-1.17.jar,放入 `$FLINK_HOME/lib` ,或在启动时指定 jar 的路径。
Flink 引入 LakeSoul 依赖的方法:下載 lakesoul-flink-2.5.0-flink-1.17.jar,放入 `$FLINK_HOME/lib` ,或在启动时指定 jar 的路径。

为了使用 Flink 创建 LakeSoul 表,推荐使用 Flink SQL Client,支持直接使用 Flink SQL 命令操作 LakeSoul 表,本文档中 Flink SQL 是在 Flink SQL Client 界面直接输入语句;Table API 需要在 Java 项目中编写使用。

Expand Down Expand Up @@ -183,7 +183,7 @@ SET execution.runtime-mode = batch;
```sql
-- 设置批式模式,读取test_table表
SET execution.runtime-mode = batch;
SELECT * FROM `lakesoul`.`default`.test_table where region='China' and `date`='2023-05-10' order by id;
SELECT * FROM `lakesoul`.`default`.test_table where region='China' and `date`='2023-05-10';

-- 设置流式模式,读取test_table表
SET execution.runtime-mode = stream;
Expand All @@ -205,7 +205,7 @@ SET execution.runtime-mode = batch;
tEnvs.useCatalog("lakeSoul");
tEnvs.useDatabase("default");
tEnvs.executeSql("SELECT * FROM test_table order by id").print();
tEnvs.executeSql("SELECT * FROM test_table").print();
```

```java
Expand Down

0 comments on commit faa356c

Please sign in to comment.