Skip to content

Commit

Permalink
[FLINK-29710][Filesystem] Bump minimum supported Hadoop version to 2.…
Browse files Browse the repository at this point in the history
…10.2

- Make the SQL Client use the Hadoop versioning as defined in the parent POM
- Make the SQL Gateway use the Hive and Hadoop versioning as defined in the parent POM
- Move and clarify Hive specific Hadoop versioning in `flink-connector-hive`
- Sync stax2-api to solve dependency convergence in hadoop-common
- Sync commons-beanutils and stax2-api to solve dependency convergence in hadoop-common
- Sync commons-beanutils and stax2-api to solve dependency convergence in hadoop-common
- Fix YarnTestBase to work with new Hadoop version. Also remove comment on issue with previously supported Hadoop version
- Disable HDFS Client to remove a failed datanode and to never add a new datanode when an existing one is removed. Its recommended to disable this for small clusters, which is the case in our tests
- Bump Hadoop 3 to 3.2.3.

HADOOP-12984 changed where the MiniYARNCluster wrote to by default.

2.10.2: https://github.com/apache/hadoop/blob/965fd380006fa78b2315668fbc7eb432e1d8200f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java#L177

This hasn't made it's way into 3.1.3: https://github.com/apache/hadoop/blob/aa96f1871bfd858f9bac59cf2a81ec470da649af/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java#L175

But was added to 3.2.3 in MAPREDUCE-7320.

Co-authored-by: Gabor Somogyi <[email protected]>
Co-authored-by: Chesnay Schepler <[email protected]>
  • Loading branch information
3 people committed Jan 10, 2023
1 parent 9bb6500 commit 573ed92
Show file tree
Hide file tree
Showing 25 changed files with 242 additions and 96 deletions.
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ stages:
vmImage: 'ubuntu-20.04'
e2e_pool_definition:
vmImage: 'ubuntu-20.04'
environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
run_end_to_end: false
container: flink-build-container
jdk: 8
Expand All @@ -97,5 +97,5 @@ stages:
- template: tools/azure-pipelines/build-python-wheels.yml
parameters:
stage_name: cron_python_wheels
environment: PROFILE="-Dflink.hadoop.version=2.8.5 -Dscala-2.12"
environment: PROFILE="-Dflink.hadoop.version=2.10.2 -Dscala-2.12"
container: flink-build-container
2 changes: 1 addition & 1 deletion docs/content.zh/docs/connectors/dataset/formats/hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/filesystems/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ You must include the following jars in Flink's `lib` directory to connect Flink
</dependency>
```

We have tested with `flink-shared-hadoop2-uber` version >= `2.8.5-1.8.3`.
We have tested with `flink-shared-hadoop2-uber` version >= `2.10.2-1.8.3`.
You can track the latest version of the [gcs-connector hadoop 2](https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar).

### Authentication to access GCS
Expand Down
4 changes: 2 additions & 2 deletions docs/content.zh/docs/deployment/resource-providers/yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Flink can dynamically allocate and de-allocate TaskManager resources depending o

### Preparation

This *Getting Started* section assumes a functional YARN environment, starting from version 2.8.5. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.
This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.

- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page]({{< downloads >}}) and unpack it.
Expand Down Expand Up @@ -219,7 +219,7 @@ Hadoop YARN 2.4.0 has a major bug (fixed in 2.5.0) preventing container restarts

### Supported Hadoop versions.

Flink on YARN is compiled against Hadoop 2.8.5, and all Hadoop versions `>= 2.8.5` are supported, including Hadoop 3.x.
Flink on YARN is compiled against Hadoop 2.10.2, and all Hadoop versions `>= 2.10.2` are supported, including Hadoop 3.x.

For providing Flink with the required Hadoop dependencies, we recommend setting the `HADOOP_CLASSPATH` environment variable already introduced in the [Getting Started / Preparation](#preparation) section.

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/dev/dataset/hadoop_compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/connectors/dataset/formats/hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/connectors/datastream/formats/hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
4 changes: 2 additions & 2 deletions docs/content/docs/deployment/resource-providers/yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Flink can dynamically allocate and de-allocate TaskManager resources depending o

### Preparation

This *Getting Started* section assumes a functional YARN environment, starting from version 2.8.5. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.
This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.

- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page]({{< downloads >}}) and unpack it.
Expand Down Expand Up @@ -235,7 +235,7 @@ Hadoop YARN 2.4.0 has a major bug (fixed in 2.5.0) preventing container restarts

### Supported Hadoop versions.

Flink on YARN is compiled against Hadoop 2.8.5, and all Hadoop versions `>= 2.8.5` are supported, including Hadoop 3.x.
Flink on YARN is compiled against Hadoop 2.10.2, and all Hadoop versions `>= 2.10.2` are supported, including Hadoop 3.x.

For providing Flink with the required Hadoop dependencies, we recommend setting the `HADOOP_CLASSPATH` environment variable already introduced in the [Getting Started / Preparation](#preparation) section.

Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/dev/dataset/hadoop_map_reduce.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ a `hadoop-client` dependency such as:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.5</version>
<version>2.10.2</version>
<scope>provided</scope>
</dependency>
```
Expand Down
73 changes: 28 additions & 45 deletions flink-connectors/flink-connector-hive/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,20 @@ under the License.
<reflections.version>0.9.8</reflections.version>
<derby.version>10.10.2.0</derby.version>
<hive.avro.version>1.8.2</hive.avro.version>
<!--
Hive requires Hadoop 2 to avoid
java.lang.NoClassDefFoundError: org/apache/hadoop/metrics/Updater errors
Using this dedicated property avoids CI failures with the Hadoop 3 profile
-->
<hive.hadoop.version>2.10.2</hive.hadoop.version>
</properties>

<!-- Overwrite hadoop dependency management from flink-parent to use locally defined Hadoop version -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
Expand Down Expand Up @@ -79,7 +84,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
Expand All @@ -95,7 +100,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
Expand All @@ -111,7 +116,7 @@ under the License.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<version>${hive.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
Expand Down Expand Up @@ -254,6 +259,24 @@ under the License.
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hive.hadoop.version}</version>
<type>test-jar</type>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>ch.qos.reload4j</groupId>
<artifactId>reload4j</artifactId>
</exclusion>
</exclusions>
</dependency>

<!-- Hive -->

<!-- Note: Hive published jars do not have proper dependencies declared.
Expand Down Expand Up @@ -910,13 +933,6 @@ under the License.
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<scope>test</scope>
</dependency>

<!-- ArchUit test dependencies -->

<dependency>
Expand Down Expand Up @@ -1115,33 +1131,8 @@ under the License.
<properties>
<hive.version>3.1.3</hive.version>
<derby.version>10.14.1.0</derby.version>
<!-- need a hadoop version that fixes HADOOP-14683 -->
<hivemetastore.hadoop.version>2.8.2</hivemetastore.hadoop.version>
</properties>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>${hive.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<!-- Override arrow netty dependency -->
<groupId>io.netty</groupId>
<artifactId>netty-buffer</artifactId>
</exclusion>
<exclusion>
<!-- Override arrow netty dependency -->
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<dependency>
<!-- Bump arrow netty dependency -->
Expand All @@ -1158,14 +1149,6 @@ under the License.
<version>4.1.46.Final</version>
<scope>provided</scope>
</dependency>

<!-- Required by orc tests -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hivemetastore.hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

</profile>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,8 @@ private void configureMrExecutionEngine() {
private void configureJavaSecurityRealm() {
// These three properties gets rid of: 'Unable to load realm info from SCDynamicStore'
// which seems to have a timeout of about 5 secs.
System.setProperty("java.security.krb5.realm", "");
System.setProperty("java.security.krb5.kdc", "");
System.setProperty("java.security.krb5.realm", "EXAMPLE.COM");
System.setProperty("java.security.krb5.kdc", "kdc");
System.setProperty("java.security.krb5.conf", "/dev/null");
}

Expand Down
12 changes: 12 additions & 0 deletions flink-end-to-end-tests/flink-end-to-end-tests-sql/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,18 @@
</dependency>
</dependencies>

<dependencyManagement>
<dependencies>
<dependency>
<!-- dependency convergence -->
<groupId>org.codehaus.woodstox</groupId>
<artifactId>stax2-api</artifactId>
<version>4.2.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</dependencyManagement>

<build>
<plugins>
<plugin>
Expand Down
6 changes: 0 additions & 6 deletions flink-end-to-end-tests/flink-sql-gateway-test/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,6 @@ under the License.
<name>Flink : E2E Tests : SQL Gateway</name>
<packaging>jar</packaging>

<properties>
<!-- The test container uses hive-2.1.0 -->
<hive.version>2.3.9</hive.version>
<flink.hadoop.version>2.8.5</flink.hadoop.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
Expand Down
2 changes: 1 addition & 1 deletion flink-end-to-end-tests/test-scripts/common_yarn_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ function start_hadoop_cluster() {
function build_image() {
echo "Pre-downloading Hadoop tarball"
local cache_path
cache_path=$(get_artifact "http://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz")
cache_path=$(get_artifact "http://archive.apache.org/dist/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz")
ln "${cache_path}" "${END_TO_END_DIR}/test-scripts/docker-hadoop-secure-cluster/hadoop/hadoop.tar.gz"

echo "Building Hadoop Docker container"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Required versions
-----------------

* JDK8
* Hadoop 2.8.5
* Hadoop 2.10.2

Default Environment Variables
-----------------------------
Expand All @@ -24,7 +24,7 @@ Run image

```
cd flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster
wget -O hadoop/hadoop.tar.gz https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
wget -O hadoop/hadoop.tar.gz https://archive.apache.org/dist/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz
docker-compose build
docker-compose up
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ public class HadoopUtilsTest extends TestLogger {

@BeforeClass
public static void setPropertiesToEnableKerberosConfigInit() throws KrbException {
System.setProperty("java.security.krb5.realm", "");
System.setProperty("java.security.krb5.kdc", "");
System.setProperty("java.security.krb5.realm", "EXAMPLE.COM");
System.setProperty("java.security.krb5.kdc", "kdc");
System.setProperty("java.security.krb5.conf", "/dev/null");
sun.security.krb5.Config.refresh();
}
Expand Down
37 changes: 37 additions & 0 deletions flink-filesystems/flink-hadoop-fs/src/test/resources/hdfs-site.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<configuration>
<!-- dfs.client.block.write.replace-datanode-on-failure.enable and
dfs.client.block.write.replace-datanode-on-failure.policy are introduced as part of FLINK-29710
When the cluster size is extremely small, e.g. 3 nodes or less, cluster
administrators may want to set the policy to NEVER in the default
configuration file or disable this feature. Otherwise, users may
experience an unusually high rate of pipeline failures since it is
impossible to find new datanodes for replacement.
-->
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>never</value>
</property>

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>never</value>
</property>
</configuration>

Loading

0 comments on commit 573ed92

Please sign in to comment.