Skip to content

Commit

Permalink
Fixes the issue that ShardingSphere cannot connect to HiveServer2 usi…
Browse files Browse the repository at this point in the history
…ng remote Hive Metastore Server
  • Loading branch information
linghengqian committed Nov 28, 2024
1 parent 4e5b0ce commit 22ae9c1
Show file tree
Hide file tree
Showing 7 changed files with 298 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,17 @@ ShardingSphere 对 HiveServer2 JDBC Driver 的支持位于可选模块中。
<artifactId>hive-service</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
```

Expand Down Expand Up @@ -81,6 +92,17 @@ ShardingSphere 对 HiveServer2 JDBC Driver 的支持位于可选模块中。
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
```

Expand Down Expand Up @@ -427,8 +449,31 @@ ShardingSphere 仅针对 HiveServer2 `4.0.1` 进行集成测试。
### Hadoop 限制

用户仅可使用 Hadoop `3.3.6` 来作为 HiveServer2 JDBC Driver `4.0.1` 的底层 Hadoop 依赖。
HiveServer2 JDBC Driver `4.0.1` 不支持 Hadoop `3.4.1`,
参考 https://github.com/apache/hive/pull/5500 。
HiveServer2 JDBC Driver `4.0.1` 不支持 Hadoop `3.4.1`, 参考 https://github.com/apache/hive/pull/5500 。

对于 HiveServer2 JDBC Driver `org.apache.hive:hive-jdbc:4.0.1` 或 `classifier` 为 `standalone` 的 `org.apache.hive:hive-jdbc:4.0.1`,
实际上并不额外依赖 `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6`。

但 `org.apache.shardingsphere:shardingsphere-infra-database-hive` 的
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` 会使用 `org.apache.hadoop.hive.conf.HiveConf`,
这进一步使用了 `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6` 的 `org.apache.hadoop.mapred.JobConf` 类。

ShardingSphere 仅需要使用 `org.apache.hadoop.mapred.JobConf` 类,
因此排除 `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6` 的所有额外依赖是合理行为。

```xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
```

### SQL 限制

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@ The possible Maven dependencies are as follows.
<artifactId>hive-service</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
```

Expand Down Expand Up @@ -83,6 +94,17 @@ The following is an example of a possible configuration,
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
```

Expand Down Expand Up @@ -433,8 +455,31 @@ Reference https://issues.apache.org/jira/browse/HIVE-28418.
### Hadoop Limitations

Users can only use Hadoop `3.3.6` as the underlying Hadoop dependency of HiveServer2 JDBC Driver `4.0.1`.
HiveServer2 JDBC Driver `4.0.1` does not support Hadoop `3.4.1`,
Reference https://github.com/apache/hive/pull/5500.
HiveServer2 JDBC Driver `4.0.1` does not support Hadoop `3.4.1`. Reference https://github.com/apache/hive/pull/5500 .

For HiveServer2 JDBC Driver `org.apache.hive:hive-jdbc:4.0.1` or `org.apache.hive:hive-jdbc:4.0.1` with `classifier` as `standalone`,
there is actually no additional dependency on `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6`.

But `org.apache.shardingsphere:shardingsphere-infra-database-hive`'s
`org.apache.shardingsphere.infra.database.hive.metadata.data.loader.HiveMetaDataLoader` uses `org.apache.hadoop.hive.conf.HiveConf`,
which further uses `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6`'s `org.apache.hadoop.mapred.JobConf` class.

ShardingSphere only needs to use the `org.apache.hadoop.mapred.JobConf` class,
so it is reasonable to exclude all additional dependencies of `org.apache.hadoop:hadoop-mapreduce-client-core:3.3.6`.

```xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.6</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
```

### SQL Limitations

Expand Down
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@
<clickhouse-jdbc.version>0.6.3</clickhouse-jdbc.version>
<hive.version>4.0.1</hive.version>
<hive-server2-jdbc-driver-thin.version>1.5.0</hive-server2-jdbc-driver-thin.version>
<hadoop.version>3.3.6</hadoop.version>
<presto.version>0.288.1</presto.version>

<hikari-cp.version>4.0.3</hikari-cp.version>
Expand Down
11 changes: 11 additions & 0 deletions test/native/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,17 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.shardingsphere.test.natived.jdbc.databases.hive;

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;
import org.apache.shardingsphere.test.natived.commons.TestShardingService;
import org.awaitility.Awaitility;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.condition.EnabledInNativeImage;
import org.testcontainers.containers.GenericContainer;
import org.testcontainers.containers.Network;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

import javax.sql.DataSource;
import java.nio.file.Paths;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.time.Duration;
import java.util.Properties;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
import static org.hamcrest.Matchers.nullValue;

@SuppressWarnings({"SqlDialectInspection", "SqlNoDataSourceInspection", "resource"})
@EnabledInNativeImage
@Testcontainers
class StandaloneMetastoreTest {

private static final Network NETWORK = Network.newNetwork();

@Container
public static final GenericContainer<?> HMS_CONTAINER = new GenericContainer<>("apache/hive:4.0.1")
.withEnv("SERVICE_NAME", "metastore")
.withNetwork(NETWORK)
.withNetworkAliases("metastore");

@Container
public static final GenericContainer<?> HS2_CONTAINER = new GenericContainer<>("apache/hive:4.0.1")
.withEnv("SERVICE_NAME", "hiveserver2")
.withEnv("SERVICE_OPTS", "-Dhive.metastore.uris=thrift://metastore:9083")
.withNetwork(NETWORK)
.withExposedPorts(10000)
.dependsOn(HMS_CONTAINER);

private static final String SYSTEM_PROP_KEY_PREFIX = "fixture.test-native.yaml.database.hive.hms.";

// Due to https://issues.apache.org/jira/browse/HIVE-28317 , the `initFile` parameter of HiveServer2 JDBC Driver must be an absolute path.
private static final String ABSOLUTE_PATH = Paths.get("src/test/resources/test-native/sql/test-native-databases-hive-iceberg.sql").toAbsolutePath().toString();

private String jdbcUrlPrefix;

@BeforeAll
static void beforeAll() {
assertThat(System.getProperty(SYSTEM_PROP_KEY_PREFIX + "ds0.jdbc-url"), is(nullValue()));
assertThat(System.getProperty(SYSTEM_PROP_KEY_PREFIX + "ds1.jdbc-url"), is(nullValue()));
assertThat(System.getProperty(SYSTEM_PROP_KEY_PREFIX + "ds2.jdbc-url"), is(nullValue()));
}

@AfterAll
static void afterAll() {
NETWORK.close();
System.clearProperty(SYSTEM_PROP_KEY_PREFIX + "ds0.jdbc-url");
System.clearProperty(SYSTEM_PROP_KEY_PREFIX + "ds1.jdbc-url");
System.clearProperty(SYSTEM_PROP_KEY_PREFIX + "ds2.jdbc-url");
}

@Test
void assertShardingInLocalTransactions() throws SQLException {
jdbcUrlPrefix = "jdbc:hive2://localhost:" + HS2_CONTAINER.getMappedPort(10000) + "/";
DataSource dataSource = createDataSource();
TestShardingService testShardingService = new TestShardingService(dataSource);
testShardingService.processSuccessInHive();
}

private Connection openConnection() throws SQLException {
Properties props = new Properties();
return DriverManager.getConnection(jdbcUrlPrefix, props);
}

private DataSource createDataSource() throws SQLException {
Awaitility.await().atMost(Duration.ofMinutes(1L)).ignoreExceptions().until(() -> {
openConnection().close();
return true;
});
try (
Connection connection = openConnection();
Statement statement = connection.createStatement()) {
statement.executeUpdate("CREATE DATABASE demo_ds_0");
statement.executeUpdate("CREATE DATABASE demo_ds_1");
statement.executeUpdate("CREATE DATABASE demo_ds_2");
}
HikariConfig config = new HikariConfig();
config.setDriverClassName("org.apache.shardingsphere.driver.ShardingSphereDriver");
config.setJdbcUrl("jdbc:shardingsphere:classpath:test-native/yaml/jdbc/databases/hive/standalone-hms.yaml?placeholder-type=system_props");
System.setProperty(SYSTEM_PROP_KEY_PREFIX + "ds0.jdbc-url", jdbcUrlPrefix + "demo_ds_0" + ";initFile=" + ABSOLUTE_PATH);
System.setProperty(SYSTEM_PROP_KEY_PREFIX + "ds1.jdbc-url", jdbcUrlPrefix + "demo_ds_1" + ";initFile=" + ABSOLUTE_PATH);
System.setProperty(SYSTEM_PROP_KEY_PREFIX + "ds2.jdbc-url", jdbcUrlPrefix + "demo_ds_2" + ";initFile=" + ABSOLUTE_PATH);
return new HikariDataSource(config);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class ZookeeperServiceDiscoveryTest {
*/
@SuppressWarnings("unused")
@Container
private static final GenericContainer<?> HIVE_SERVER2_1_CONTAINER = new FixedHostPortGenericContainer<>("apache/hive:4.0.1")
private static final GenericContainer<?> HS2_1_CONTAINER = new FixedHostPortGenericContainer<>("apache/hive:4.0.1")
.withNetwork(NETWORK)
.withEnv("SERVICE_NAME", "hiveserver2")
.withEnv("SERVICE_OPTS", "-Dhive.server2.support.dynamic.service.discovery=true" + " "
Expand Down Expand Up @@ -116,10 +116,10 @@ void assertShardingInLocalTransactions() throws SQLException {
DataSource dataSource = createDataSource();
TestShardingService testShardingService = new TestShardingService(dataSource);
testShardingService.processSuccessInHive();
HIVE_SERVER2_1_CONTAINER.stop();
HS2_1_CONTAINER.stop();
int randomPortSecond = InstanceSpec.getRandomPort();
try (
GenericContainer<?> hiveServer2SecondContainer = new FixedHostPortGenericContainer<>("apache/hive:4.0.1")
GenericContainer<?> hs2SecondContainer = new FixedHostPortGenericContainer<>("apache/hive:4.0.1")
.withNetwork(NETWORK)
.withEnv("SERVICE_NAME", "hiveserver2")
.withEnv("SERVICE_OPTS", "-Dhive.server2.support.dynamic.service.discovery=true" + " "
Expand All @@ -128,8 +128,8 @@ void assertShardingInLocalTransactions() throws SQLException {
+ "-Dhive.server2.thrift.port=" + randomPortSecond)
.withFixedExposedPort(randomPortSecond, randomPortSecond)
.dependsOn(ZOOKEEPER_CONTAINER)) {
hiveServer2SecondContainer.start();
extracted(hiveServer2SecondContainer.getMappedPort(randomPortSecond));
hs2SecondContainer.start();
extracted(hs2SecondContainer.getMappedPort(randomPortSecond));
testShardingService.processSuccessInHive();
}
}
Expand All @@ -140,7 +140,7 @@ private Connection openConnection() throws SQLException {
}

private DataSource createDataSource() throws SQLException {
extracted(HIVE_SERVER2_1_CONTAINER.getMappedPort(RANDOM_PORT_FIRST));
extracted(HS2_1_CONTAINER.getMappedPort(RANDOM_PORT_FIRST));
HikariConfig config = new HikariConfig();
config.setDriverClassName("org.apache.shardingsphere.driver.ShardingSphereDriver");
config.setJdbcUrl("jdbc:shardingsphere:classpath:test-native/yaml/jdbc/databases/hive/zsd.yaml?placeholder-type=system_props");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

dataSources:
ds_0:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: org.apache.hive.jdbc.HiveDriver
jdbcUrl: $${fixture.test-native.yaml.database.hive.hms.ds0.jdbc-url::}
ds_1:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: org.apache.hive.jdbc.HiveDriver
jdbcUrl: $${fixture.test-native.yaml.database.hive.hms.ds1.jdbc-url::}
ds_2:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: org.apache.hive.jdbc.HiveDriver
jdbcUrl: $${fixture.test-native.yaml.database.hive.hms.ds2.jdbc-url::}

rules:
- !SHARDING
tables:
t_order:
actualDataNodes: <LITERAL>ds_0.t_order, ds_1.t_order, ds_2.t_order
keyGenerateStrategy:
column: order_id
keyGeneratorName: snowflake
t_order_item:
actualDataNodes: <LITERAL>ds_0.t_order_item, ds_1.t_order_item, ds_2.t_order_item
keyGenerateStrategy:
column: order_item_id
keyGeneratorName: snowflake
defaultDatabaseStrategy:
standard:
shardingColumn: user_id
shardingAlgorithmName: inline
shardingAlgorithms:
inline:
type: CLASS_BASED
props:
strategy: STANDARD
algorithmClassName: org.apache.shardingsphere.test.natived.commons.algorithm.ClassBasedInlineShardingAlgorithmFixture
keyGenerators:
snowflake:
type: SNOWFLAKE
auditors:
sharding_key_required_auditor:
type: DML_SHARDING_CONDITIONS

- !BROADCAST
tables:
- t_address

0 comments on commit 22ae9c1

Please sign in to comment.