diff --git a/connectors/AUTHORS b/connectors/AUTHORS deleted file mode 100644 index 3d97a015bc2..00000000000 --- a/connectors/AUTHORS +++ /dev/null @@ -1,8 +0,0 @@ -# This is the official list of the Delta Lake Project Authors for copyright purposes. - -# Names should be added to this file as: -# Name or Organization -# The email address is not required for organizations. - -Databricks -Scribd Inc diff --git a/connectors/CONTRIBUTING.md b/connectors/CONTRIBUTING.md deleted file mode 100644 index ba95630a4fe..00000000000 --- a/connectors/CONTRIBUTING.md +++ /dev/null @@ -1,74 +0,0 @@ -We happily welcome contributions to Delta Lake Connectors. We use [GitHub Issues](/../../issues/) to track community reported issues and [GitHub Pull Requests ](/../../pulls/) for accepting changes. - -# Governance -Delta lake governance is conducted by the Technical Steering Committee (TSC), which is currently composed of the following members: - - Michael Armbrust (michael.armbrust@gmail.com) - - Reynold Xin (reynoldx@gmail.com) - - Matei Zaharia (matei@cs.stanford.edu) - -The founding technical charter can be found [here](https://delta.io/pdfs/delta-charter.pdf). - -# Communication -Before starting work on a major feature, please reach out to us via GitHub, Slack, email, etc. We will make sure no one else is already working on it and ask you to open a GitHub issue. -A "major feature" is defined as any change that is > 100 LOC altered (not including tests), or changes any user-facing behavior. -We will use the GitHub issue to discuss the feature and come to agreement. -This is to prevent your time being wasted, as well as ours. -The GitHub review process for major features is also important so that organizations with commit access can come to agreement on design. -If it is appropriate to write a design document, the document must be hosted either in the GitHub tracking issue, or linked to from the issue and hosted in a world-readable location. -Specifically, if the goal is to add a new extension, please read the extension policy. -Small patches and bug fixes don't need prior communication. - -# Coding style -We generally follow the Apache Spark Scala Style Guide. - -# Sign your work -The sign-off is a simple line at the end of the explanation for the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below (from developercertificate.org): - -``` -Developer Certificate of Origin -Version 1.1 - -Copyright (C) 2004, 2006 The Linux Foundation and its contributors. -1 Letterman Drive -Suite D4700 -San Francisco, CA, 94129 - -Everyone is permitted to copy and distribute verbatim copies of this -license document, but changing it is not allowed. - - -Developer's Certificate of Origin 1.1 - -By making a contribution to this project, I certify that: - -(a) The contribution was created in whole or in part by me and I - have the right to submit it under the open source license - indicated in the file; or - -(b) The contribution is based upon previous work that, to the best - of my knowledge, is covered under an appropriate open source - license and I have the right under that license to submit that - work with modifications, whether created in whole or in part - by me, under the same open source license (unless I am - permitted to submit under a different license), as indicated - in the file; or - -(c) The contribution was provided directly to me by some other - person who certified (a), (b) or (c) and I have not modified - it. - -(d) I understand and agree that this project and the contribution - are public and that a record of the contribution (including all - personal information I submit with it, including my sign-off) is - maintained indefinitely and may be redistributed consistent with - this project or the open source license(s) involved. -``` - -Then you just add a line to every git commit message: - -``` -Signed-off-by: Joe Smith -Use your real name (sorry, no pseudonyms or anonymous contributions.) -``` - -If you set your `user.name` and `user.email` git configs, you can sign your commit automatically with git commit -s. diff --git a/connectors/LICENSE.txt b/connectors/LICENSE.txt deleted file mode 100644 index a1ceba72234..00000000000 --- a/connectors/LICENSE.txt +++ /dev/null @@ -1,198 +0,0 @@ -Copyright (2020-present) The Delta Lake Project Authors. All rights reserved. - - - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - -------------------------------------------------------------------------- -This product bundles various third-party components under other open source licenses. -This section summarizes those components and their licenses. See licenses/ -for text of these licenses. - - -Apache Software Foundation License 2.0 --------------------------------------- - -standalone/src/main/java/io/delta/standalone/types/* -standalone/src/main/scala/io/delta/standalone/internal/util/DataTypeParser.scala - - -MIT License ------------ - -standalone/src/main/scala/io/delta/standalone/internal/data/RowParquetRecordImpl.scala diff --git a/connectors/NOTICE.txt b/connectors/NOTICE.txt deleted file mode 100644 index 1341a99ce5c..00000000000 --- a/connectors/NOTICE.txt +++ /dev/null @@ -1,24 +0,0 @@ -Delta Lake Connectors -Copyright (2020-present) The Delta Lake Project Authors. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. - -This project includes software licensed by the Apache Software Foundation (Apache 2.0) -from the Apache Spark project (www.github.com/apache/spark) - ----------------------------------------------------------- -Apache Spark -Copyright 2014 and onwards The Apache Software Foundation. - -This product includes software developed at -The Apache Software Foundation (http://www.apache.org/). diff --git a/connectors/README.md b/connectors/README.md index 005d03bba98..77c1156ade2 100644 --- a/connectors/README.md +++ b/connectors/README.md @@ -1,24 +1,4 @@ -# Delta Lake Logo Connectors - -[![Test](https://github.com/delta-io/connectors/actions/workflows/test.yaml/badge.svg)](https://github.com/delta-io/connectors/actions/workflows/test.yaml) -[![License](https://img.shields.io/badge/license-Apache%202-brightgreen.svg)](https://github.com/delta-io/connectors/blob/master/LICENSE.txt) - -We are building connectors to bring [Delta Lake](https://delta.io) to popular big-data engines outside [Apache Spark](https://spark.apache.org) (e.g., [Apache Hive](https://hive.apache.org/), [Presto](https://prestodb.io/), [Apache Flink](https://flink.apache.org/)) and also to common reporting tools like [Microsoft Power BI](https://powerbi.microsoft.com/). - -# Introduction - -This is the repository for Delta Lake Connectors. It includes -- [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html): a native library for reading and writing Delta Lake metadata. -- Connectors to popular big-data engines (e.g., [Apache Hive](https://hive.apache.org/), [Presto](https://prestodb.io/), [Apache Flink](https://flink.apache.org/)) and to common reporting tools like [Microsoft Power BI](https://powerbi.microsoft.com/). - -Please refer to the main [Delta Lake](https://github.com/delta-io/delta) repository if you want to learn more about the Delta Lake project. - -# API documentation - -- Delta Standalone [Java API docs](https://delta-io.github.io/connectors/latest/delta-standalone/api/java/index.html) -- Flink/Delta Connector [Java API docs](https://delta-io.github.io/connectors/latest/delta-flink/api/java/index.html) - -# Delta Standalone +## Delta Standalone Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read **and write** Delta tables. Unlike https://github.com/delta-io/delta, this project doesn't use Spark to read or write tables and it has only a few transitive dependencies. It can be used by any application that cannot use a Spark cluster. - To compile the project, run `build/sbt standalone/compile` @@ -27,9 +7,6 @@ Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM See [Delta Standalone](https://docs.delta.io/latest/delta-standalone.html) for detailed documentation. - -# Connectors - ## Hive Connector Read Delta tables directly from Apache Hive using the [Hive Connector](/hive/README.md). See the dedicated [README.md](/hive/README.md) for more details. @@ -45,23 +22,3 @@ Use the [Flink/Delta Connector](flink/README.md) to read and write Delta tables ## Power BI connector The connector for [Microsoft Power BI](https://powerbi.microsoft.com/) is basically just a custom Power Query function that allows you to read a Delta table from any file-based [data source supported by Microsoft Power BI](https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-data-sources). Details can be found in the dedicated [README.md](/powerbi/README.md). -# Reporting issues - -We use [GitHub Issues](https://github.com/delta-io/connectors/issues) to track community reported issues. You can also [contact](#community) the community for getting answers. - -# Contributing - -We welcome contributions to Delta Lake Connectors repository. We use [GitHub Pull Requests](https://github.com/delta-io/connectors/pulls) for accepting changes. - -# Community - -There are two mediums of communication within the Delta Lake community. - -- Public Slack Channel - - [Register here](https://go.delta.io/slack) - - [Login here](https://delta-users.slack.com/) - -- Public [Mailing list](https://groups.google.com/forum/#!forum/delta-users) - -# Local Development & Testing -- Before local debugging of `standalone` tests in IntelliJ, run all `standalone` tests using SBT. This helps IntelliJ recognize the golden tables as class resources. diff --git a/connectors/build.sbt b/connectors/build.sbt deleted file mode 100644 index 07ec1e4c872..00000000000 --- a/connectors/build.sbt +++ /dev/null @@ -1,827 +0,0 @@ -/* - * Copyright (2020-present) The Delta Lake Project Authors. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -// scalastyle:off line.size.limit - -import ReleaseTransformations._ -import scala.xml.{Node => XmlNode, NodeSeq => XmlNodeSeq, _} -import scala.xml.transform._ - -// Disable parallel execution to workaround https://github.com/etsy/sbt-checkstyle-plugin/issues/32 -concurrentRestrictions in Global := { - Tags.limitAll(1) :: Nil -} - -inThisBuild( - Seq( - parallelExecution := false, - ) -) - -// crossScalaVersions must be set to Nil on the root project -crossScalaVersions := Nil -val scala213 = "2.13.8" -val scala212 = "2.12.8" -val scala211 = "2.11.12" - -lazy val compileScalastyle = taskKey[Unit]("compileScalastyle") -lazy val testScalastyle = taskKey[Unit]("testScalastyle") - -val sparkVersion = "2.4.3" -val hiveDeltaVersion = "0.5.0" -val parquet4sVersion = "1.9.4" -val scalaTestVersion = "3.0.8" -val deltaStorageVersion = "2.4.0" -// Versions for Hive 3 -val hadoopVersion = "3.1.0" -val hiveVersion = "3.1.2" -val tezVersion = "0.9.2" -// Versions for Hive 2 -val hadoopVersionForHive2 = "2.7.2" -val hive2Version = "2.3.3" -val tezVersionForHive2 = "0.8.4" - -def scalacWarningUnusedImport(version: String) = version match { - case v if v.startsWith("2.13.") => "-Ywarn-unused:imports" - case _ => "-Ywarn-unused-import" -} - -lazy val commonSettings = Seq( - organization := "io.delta", - scalaVersion := scala212, - crossScalaVersions := Seq(scala213, scala212, scala211), - fork := true, - javacOptions ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint:unchecked"), - scalacOptions ++= Seq("-target:jvm-1.8", scalacWarningUnusedImport(scalaVersion.value) ), - // Configurations to speed up tests and reduce memory footprint - Test / javaOptions ++= Seq( - "-Dspark.ui.enabled=false", - "-Dspark.ui.showConsoleProgress=false", - "-Dspark.databricks.delta.snapshotPartitions=2", - "-Dspark.sql.shuffle.partitions=5", - "-Ddelta.log.cacheSize=3", - "-Dspark.sql.sources.parallelPartitionDiscovery.parallelism=5", - "-Xmx1024m" - ), - compileScalastyle := (Compile / scalastyle).toTask("").value, - (Compile / compile ) := ((Compile / compile) dependsOn compileScalastyle).value, - testScalastyle := (Test / scalastyle).toTask("").value, - (Test / test) := ((Test / test) dependsOn testScalastyle).value, - - // Can be run explicitly via: build/sbt $module/checkstyle - // Will automatically be run during compilation (e.g. build/sbt compile) - // and during tests (e.g. build/sbt test) - checkstyleConfigLocation := CheckstyleConfigLocation.File("dev/checkstyle.xml"), - checkstyleSeverityLevel := Some(CheckstyleSeverityLevel.Error), - (Compile / checkstyle) := (Compile / checkstyle).triggeredBy(Compile / compile).value, - (Test / checkstyle) := (Test / checkstyle).triggeredBy(Test / compile).value -) - -lazy val releaseSettings = Seq( - publishMavenStyle := true, - publishArtifact := true, - Test / publishArtifact := false, - releasePublishArtifactsAction := PgpKeys.publishSigned.value, - releaseCrossBuild := true, - pgpPassphrase := sys.env.get("PGP_PASSPHRASE").map(_.toArray), - sonatypeProfileName := "io.delta", // sonatype account domain name prefix / group ID - credentials += Credentials( - "Sonatype Nexus Repository Manager", - "oss.sonatype.org", - sys.env.getOrElse("SONATYPE_USERNAME", ""), - sys.env.getOrElse("SONATYPE_PASSWORD", "") - ), - publishTo := { - val nexus = "https://oss.sonatype.org/" - if (isSnapshot.value) { - Some("snapshots" at nexus + "content/repositories/snapshots") - } else { - Some("releases" at nexus + "service/local/staging/deploy/maven2") - } - }, - licenses += ("Apache-2.0", url("http://www.apache.org/licenses/LICENSE-2.0")), - pomExtra := - https://github.com/delta-io/connectors - - git@github.com:delta-io/connectors.git - scm:git:git@github.com:delta-io/connectors.git - - - - tdas - Tathagata Das - https://github.com/tdas - - - scottsand-db - Scott Sandre - https://github.com/scottsand-db - - - windpiger - Jun Song - https://github.com/windpiger - - - zsxwing - Shixiong Zhu - https://github.com/zsxwing - - -) - -lazy val skipReleaseSettings = Seq( - publishArtifact := false, - publish / skip := true -) - -// Looks some of release settings should be set for the root project as well. -publishArtifact := false // Don't release the root project -publish / skip := true -publishTo := Some("snapshots" at "https://oss.sonatype.org/content/repositories/snapshots") -releaseCrossBuild := false -releaseProcess := Seq[ReleaseStep]( - checkSnapshotDependencies, - inquireVersions, - runTest, - setReleaseVersion, - commitReleaseVersion, - tagRelease, - releaseStepCommandAndRemaining("+publishSigned"), - setNextVersion, - commitNextVersion -) - -lazy val hive = (project in file("hive")) dependsOn(standaloneCosmetic) settings ( - name := "delta-hive", - commonSettings, - releaseSettings, - - // Minimal dependencies to compile the codes. This project doesn't run any tests so we don't need - // any runtime dependencies. - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "org.apache.hive" % "hive-exec" % hiveVersion % "provided" classifier "core", - "org.apache.hive" % "hive-metastore" % hiveVersion % "provided" - ) -) - -lazy val hiveAssembly = (project in file("hive-assembly")) dependsOn(hive) settings( - name := "delta-hive-assembly", - Compile / unmanagedJars += (hive / assembly).value, - commonSettings, - skipReleaseSettings, - - assembly / logLevel := Level.Info, - assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar", - assembly / test := {}, - // Make the 'compile' invoke the 'assembly' task to generate the uber jar. - Compile / packageBin := assembly.value -) - -lazy val hiveTest = (project in file("hive-test")) settings ( - name := "hive-test", - // Make the project use the assembly jar to ensure we are testing the assembly jar that users will - // use in real environment. - Compile / unmanagedJars += (hiveAssembly / Compile / packageBin / packageBin).value, - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "org.apache.hive" % "hive-exec" % hiveVersion % "provided" classifier "core" excludeAll( - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.apache.hive" % "hive-metastore" % hiveVersion % "provided" excludeAll( - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.apache.hive", "hive-exec") - ), - "org.apache.hive" % "hive-cli" % hiveVersion % "test" excludeAll( - ExclusionRule("ch.qos.logback", "logback-classic"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule("org.apache.hive", "hive-exec"), - ExclusionRule("com.google.guava", "guava"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ) -) - -lazy val hiveMR = (project in file("hive-mr")) dependsOn(hiveTest % "test->test") settings ( - name := "hive-mr", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "org.apache.hive" % "hive-exec" % hiveVersion % "provided" excludeAll( - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm") - ), - "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-mapreduce-client-hs" % hadoopVersion % "test", - "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % hadoopVersion % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-yarn-server-tests" % hadoopVersion % "test" classifier "tests", - "org.apache.hive" % "hive-cli" % hiveVersion % "test" excludeAll( - ExclusionRule("ch.qos.logback", "logback-classic"), - ExclusionRule("com.google.guava", "guava"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm") - ), - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ) -) - -lazy val hiveTez = (project in file("hive-tez")) dependsOn(hiveTest % "test->test") settings ( - name := "hive-tez", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided" excludeAll ( - ExclusionRule(organization = "com.google.protobuf") - ), - "com.google.protobuf" % "protobuf-java" % "2.5.0", - "org.apache.hive" % "hive-exec" % hiveVersion % "provided" classifier "core" excludeAll( - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.jodd" % "jodd-core" % "3.5.2", - "org.apache.hive" % "hive-metastore" % hiveVersion % "provided" excludeAll( - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.apache.hive", "hive-exec") - ), - "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-mapreduce-client-hs" % hadoopVersion % "test", - "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % hadoopVersion % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-yarn-server-tests" % hadoopVersion % "test" classifier "tests", - "org.apache.hive" % "hive-cli" % hiveVersion % "test" excludeAll( - ExclusionRule("ch.qos.logback", "logback-classic"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule("org.apache.hive", "hive-exec"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion % "test", - "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion % "test", - "org.apache.tez" % "tez-mapreduce" % tezVersion % "test", - "org.apache.tez" % "tez-dag" % tezVersion % "test", - "org.apache.tez" % "tez-tests" % tezVersion % "test" classifier "tests", - "com.esotericsoftware" % "kryo-shaded" % "4.0.2" % "test", - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ) -) - - -lazy val hive2MR = (project in file("hive2-mr")) settings ( - name := "hive2-mr", - commonSettings, - skipReleaseSettings, - Compile / unmanagedJars ++= Seq( - (hiveAssembly / Compile / packageBin / packageBin).value, - (hiveTest / Test / packageBin / packageBin).value - ), - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersionForHive2 % "provided", - "org.apache.hive" % "hive-exec" % hive2Version % "provided" excludeAll( - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm") - ), - "org.apache.hadoop" % "hadoop-common" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-mapreduce-client-hs" % hadoopVersionForHive2 % "test", - "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-yarn-server-tests" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hive" % "hive-cli" % hive2Version % "test" excludeAll( - ExclusionRule("ch.qos.logback", "logback-classic"), - ExclusionRule("com.google.guava", "guava"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm") - ), - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ) -) - -lazy val hive2Tez = (project in file("hive2-tez")) settings ( - name := "hive2-tez", - commonSettings, - skipReleaseSettings, - Compile / unmanagedJars ++= Seq( - (hiveAssembly / Compile / packageBin / packageBin).value, - (hiveTest / Test / packageBin / packageBin).value - ), - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersionForHive2 % "provided" excludeAll ( - ExclusionRule(organization = "com.google.protobuf") - ), - "com.google.protobuf" % "protobuf-java" % "2.5.0", - "org.apache.hive" % "hive-exec" % hive2Version % "provided" classifier "core" excludeAll( - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.jodd" % "jodd-core" % "3.5.2", - "org.apache.hive" % "hive-metastore" % hive2Version % "provided" excludeAll( - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule("org.apache.hive", "hive-exec") - ), - "org.apache.hadoop" % "hadoop-common" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-mapreduce-client-hs" % hadoopVersionForHive2 % "test", - "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hadoop" % "hadoop-yarn-server-tests" % hadoopVersionForHive2 % "test" classifier "tests", - "org.apache.hive" % "hive-cli" % hive2Version % "test" excludeAll( - ExclusionRule("ch.qos.logback", "logback-classic"), - ExclusionRule("org.pentaho", "pentaho-aggdesigner-algorithm"), - ExclusionRule("org.apache.hive", "hive-exec"), - ExclusionRule(organization = "org.eclipse.jetty"), - ExclusionRule(organization = "com.google.protobuf") - ), - "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersionForHive2 % "test", - "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersionForHive2 % "test", - "org.apache.tez" % "tez-mapreduce" % tezVersionForHive2 % "test", - "org.apache.tez" % "tez-dag" % tezVersionForHive2 % "test", - "org.apache.tez" % "tez-tests" % tezVersionForHive2 % "test" classifier "tests", - "com.esotericsoftware" % "kryo-shaded" % "4.0.2" % "test", - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ) -) - -/** - * We want to publish the `standalone` project's shaded JAR (created from the - * build/sbt standalone/assembly command). - * - * However, build/sbt standalone/publish and build/sbt standalone/publishLocal will use the - * non-shaded JAR from the build/sbt standalone/package command. - * - * So, we create an impostor, cosmetic project used only for publishing. - * - * build/sbt standaloneCosmetic/package - * - creates connectors/standalone/target/scala-2.12/delta-standalone-original-shaded_2.12-0.2.1-SNAPSHOT.jar - * (this is the shaded JAR we want) - * - * build/sbt standaloneCosmetic/publishM2 - * - packages the shaded JAR (above) and then produces: - * -- .m2/repository/io/delta/delta-standalone_2.12/0.2.1-SNAPSHOT/delta-standalone_2.12-0.2.1-SNAPSHOT.pom - * -- .m2/repository/io/delta/delta-standalone_2.12/0.2.1-SNAPSHOT/delta-standalone_2.12-0.2.1-SNAPSHOT.jar - * -- .m2/repository/io/delta/delta-standalone_2.12/0.2.1-SNAPSHOT/delta-standalone_2.12-0.2.1-SNAPSHOT-sources.jar - * -- .m2/repository/io/delta/delta-standalone_2.12/0.2.1-SNAPSHOT/delta-standalone_2.12-0.2.1-SNAPSHOT-javadoc.jar - */ -lazy val standaloneCosmetic = project - .settings( - name := "delta-standalone", - commonSettings, - releaseSettings, - exportJars := true, - Compile / packageBin := (standaloneParquet / assembly).value, - libraryDependencies ++= scalaCollectionPar(scalaVersion.value) ++ Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "org.apache.parquet" % "parquet-hadoop" % "1.12.0" % "provided", - "io.delta" % "delta-storage" % deltaStorageVersion, - // parquet4s-core dependencies that are not shaded are added with compile scope. - "com.chuusai" %% "shapeless" % "2.3.4", - "org.scala-lang.modules" %% "scala-collection-compat" % "2.4.3" - ) - ) - -lazy val testStandaloneCosmetic = project.dependsOn(standaloneCosmetic) - .settings( - name := "test-standalone-cosmetic", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion, - "org.scalatest" %% "scalatest" % scalaTestVersion % "test", - ) - ) - -/** - * A test project to verify `ParquetSchemaConverter` APIs are working after the user provides - * `parquet-hadoop`. We use a separate project because we want to test whether Delta Standlone APIs - * except `ParquetSchemaConverter` are working without `parquet-hadoop` in testStandaloneCosmetic`. - */ -lazy val testParquetUtilsWithStandaloneCosmetic = project.dependsOn(standaloneCosmetic) - .settings( - name := "test-parquet-utils-with-standalone-cosmetic", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion, - "org.apache.parquet" % "parquet-hadoop" % "1.12.0" % "provided", - "org.scalatest" %% "scalatest" % scalaTestVersion % "test", - ) - ) - -def scalaCollectionPar(version: String) = version match { - case v if v.startsWith("2.13.") => - Seq("org.scala-lang.modules" %% "scala-parallel-collections" % "1.0.4") - case _ => Seq() -} - -/** - * The public API ParquetSchemaConverter exposes Parquet classes in its methods so we cannot apply - * shading rules on it. However, sbt-assembly doesn't allow excluding a single file. Hence, we - * create a separate project to skip the shading. - */ -lazy val standaloneParquet = (project in file("standalone-parquet")) - .dependsOn(standaloneWithoutParquetUtils) - .settings( - name := "delta-standalone-parquet", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - "org.apache.parquet" % "parquet-hadoop" % "1.12.0" % "provided", - "org.scalatest" %% "scalatest" % scalaTestVersion % "test" - ), - assemblyPackageScala / assembleArtifact := false - ) - -/** A dummy project to allow `standaloneParquet` depending on the shaded standalone jar. */ -lazy val standaloneWithoutParquetUtils = project - .settings( - name := "delta-standalone-without-parquet-utils", - commonSettings, - skipReleaseSettings, - exportJars := true, - Compile / packageBin := (standalone / assembly).value - ) - -lazy val standalone = (project in file("standalone")) - .enablePlugins(GenJavadocPlugin, JavaUnidocPlugin) - .settings( - name := "delta-standalone-original", - commonSettings, - skipReleaseSettings, - mimaSettings, // TODO(scott): move this to standaloneCosmetic - // When updating any dependency here, we should also review `pomPostProcess` in project - // `standaloneCosmetic` and update it accordingly. - libraryDependencies ++= scalaCollectionPar(scalaVersion.value) ++ Seq( - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "com.github.mjakubowski84" %% "parquet4s-core" % parquet4sVersion excludeAll ( - ExclusionRule("org.slf4j", "slf4j-api") - ), - "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.12.3", - "org.json4s" %% "json4s-jackson" % "3.7.0-M11" excludeAll ( - ExclusionRule("com.fasterxml.jackson.core"), - ExclusionRule("com.fasterxml.jackson.module") - ), - "org.scalatest" %% "scalatest" % scalaTestVersion % "test", - "io.delta" % "delta-storage" % deltaStorageVersion, - - // Compiler plugins - // -- Bump up the genjavadoc version explicitly to 0.18 to work with Scala 2.12 - compilerPlugin("com.typesafe.genjavadoc" %% "genjavadoc-plugin" % "0.18" cross CrossVersion.full) - ), - Compile / sourceGenerators += Def.task { - val file = (Compile / sourceManaged).value / "io" / "delta" / "standalone" / "package.scala" - IO.write(file, - s"""package io.delta - | - |package object standalone { - | val VERSION = "${version.value}" - | val NAME = "Delta Standalone" - |} - |""".stripMargin) - Seq(file) - }, - - /** - * Standalone packaged (unshaded) jar. - * - * Build with `build/sbt standalone/package` command. - * e.g. connectors/standalone/target/scala-2.12/delta-standalone-original-unshaded_2.12-0.2.1-SNAPSHOT.jar - */ - artifactName := { (sv: ScalaVersion, module: ModuleID, artifact: Artifact) => - artifact.name + "-unshaded" + "_" + sv.binary + "-" + module.revision + "." + artifact.extension - }, - - /** - * Standalone assembly (shaded) jar. This is what we want to release. - * - * Build with `build/sbt standalone/assembly` command. - * e.g. connectors/standalone/target/scala-2.12/delta-standalone-original-shaded_2.12-0.2.1-SNAPSHOT.jar - */ - assembly / logLevel := Level.Info, - assembly / test := {}, - assembly / assemblyJarName := s"${name.value}-shaded_${scalaBinaryVersion.value}-${version.value}.jar", - // we exclude jars first, and then we shade what is remaining - assembly / assemblyExcludedJars := { - val cp = (assembly / fullClasspath).value - val allowedPrefixes = Set("META_INF", "io", "json4s", "jackson", "paranamer", - "parquet4s", "parquet-", "audience-annotations", "commons-pool") - cp.filter { f => - !allowedPrefixes.exists(prefix => f.data.getName.startsWith(prefix)) - } - }, - assembly / assemblyShadeRules := Seq( - ShadeRule.rename("com.fasterxml.jackson.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("com.thoughtworks.paranamer.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("org.json4s.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("com.github.mjakubowski84.parquet4s.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("org.apache.commons.pool.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("org.apache.parquet.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("shaded.parquet.**" -> "shadedelta.@0").inAll, - ShadeRule.rename("org.apache.yetus.audience.**" -> "shadedelta.@0").inAll - ), - assembly / assemblyMergeStrategy := { - // Discard `module-info.class` to fix the `different file contents found` error. - // TODO Upgrade SBT to 1.5 which will do this automatically - case "module-info.class" => MergeStrategy.discard - // Discard unused `parquet.thrift` so that we don't conflict the file used by the user - case "parquet.thrift" => MergeStrategy.discard - // Discard the jackson service configs that we don't need. These files are not shaded so - // adding them may conflict with other jackson version used by the user. - case PathList("META-INF", "services", xs @ _*) => MergeStrategy.discard - case x => - val oldStrategy = (assembly / assemblyMergeStrategy).value - oldStrategy(x) - }, - assembly / artifact := { - val art = (assembly / artifact).value - art.withClassifier(Some("assembly")) - }, - addArtifact(assembly / artifact, assembly), - /** - * Unidoc settings - * Generate javadoc with `unidoc` command, outputs to `standalone/target/javaunidoc` - */ - JavaUnidoc / unidoc / javacOptions := Seq( - "-public", - "-windowtitle", "Delta Standalone " + version.value.replaceAll("-SNAPSHOT", "") + " JavaDoc", - "-noqualifier", "java.lang", - "-tag", "implNote:a:Implementation Note:", - "-Xdoclint:all" - ), - JavaUnidoc / unidoc / unidocAllSources := { - (JavaUnidoc / unidoc / unidocAllSources).value - // ignore any internal Scala code - .map(_.filterNot(_.getName.contains("$"))) - .map(_.filterNot(_.getCanonicalPath.contains("/internal/"))) - // ignore project `hive` which depends on this project - .map(_.filterNot(_.getCanonicalPath.contains("/hive/"))) - // ignore project `flink` which depends on this project - .map(_.filterNot(_.getCanonicalPath.contains("/flink/"))) - }, - // Ensure unidoc is run with tests. Must be cleaned before test for unidoc to be generated. - (Test / test) := ((Test / test) dependsOn (Compile / unidoc)).value - ) - -/* - ******************** - * MIMA settings * - ******************** - */ -def getPrevVersion(currentVersion: String): String = { - implicit def extractInt(str: String): Int = { - """\d+""".r.findFirstIn(str).map(java.lang.Integer.parseInt).getOrElse { - throw new Exception(s"Could not extract version number from $str in $version") - } - } - - val (major, minor, patch): (Int, Int, Int) = { - currentVersion.split("\\.").toList match { - case majorStr :: minorStr :: patchStr :: _ => - (majorStr, minorStr, patchStr) - case _ => throw new Exception(s"Could not find previous version for $version.") - } - } - - val majorToLastMinorVersions: Map[Int, Int] = Map( - // TODO add mapping when required - // e.g. 0 -> 8 - ) - if (minor == 0) { // 1.0.0 - val prevMinor = majorToLastMinorVersions.getOrElse(major - 1, { - throw new Exception(s"Last minor version of ${major - 1}.x.x not configured.") - }) - s"${major - 1}.$prevMinor.0" // 1.0.0 -> 0.8.0 - } else if (patch == 0) { - s"$major.${minor - 1}.0" // 1.1.0 -> 1.0.0 - } else { - s"$major.$minor.${patch - 1}" // 1.1.1 -> 1.1.0 - } -} - -lazy val mimaSettings = Seq( - (Test / test) := ((Test / test) dependsOn mimaReportBinaryIssues).value, - mimaPreviousArtifacts := { - if (CrossVersion.partialVersion(scalaVersion.value) == Some((2, 13))) { - // Skip mima check since we don't have a Scala 2.13 release yet. - // TODO Update this after releasing 0.4.0. - Set.empty - } else { - Set("io.delta" %% "delta-standalone" % getPrevVersion(version.value)) - } - }, - mimaBinaryIssueFilters ++= StandaloneMimaExcludes.ignoredABIProblems -) - -lazy val compatibility = (project in file("oss-compatibility-tests")) - // depend on standalone test codes as well - .dependsOn(standalone % "compile->compile;test->test") - .settings( - name := "compatibility", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - // Test Dependencies - "io.netty" % "netty-buffer" % "4.1.63.Final" % "test", - "org.scalatest" %% "scalatest" % "3.1.0" % "test", - "org.apache.spark" % "spark-sql_2.12" % "3.2.0" % "test", - "io.delta" % "delta-core_2.12" % "1.1.0" % "test", - "commons-io" % "commons-io" % "2.8.0" % "test", - "org.apache.spark" % "spark-catalyst_2.12" % "3.2.0" % "test" classifier "tests", - "org.apache.spark" % "spark-core_2.12" % "3.2.0" % "test" classifier "tests", - "org.apache.spark" % "spark-sql_2.12" % "3.2.0" % "test" classifier "tests", - ) - ) - -lazy val goldenTables = (project in file("golden-tables")) settings ( - name := "golden-tables", - commonSettings, - skipReleaseSettings, - libraryDependencies ++= Seq( - // Test Dependencies - "org.scalatest" %% "scalatest" % "3.1.0" % "test", - "org.apache.spark" % "spark-sql_2.12" % "3.2.0" % "test", - "io.delta" % "delta-core_2.12" % "1.1.0" % "test", - "commons-io" % "commons-io" % "2.8.0" % "test", - "org.apache.spark" % "spark-catalyst_2.12" % "3.2.0" % "test" classifier "tests", - "org.apache.spark" % "spark-core_2.12" % "3.2.0" % "test" classifier "tests", - "org.apache.spark" % "spark-sql_2.12" % "3.2.0" % "test" classifier "tests" - ) -) - -def sqlDeltaImportScalaVersion(scalaBinaryVersion: String): String = { - scalaBinaryVersion match { - // sqlDeltaImport doesn't support 2.11. We return 2.12 so that we can resolve the dependencies - // but we will not publish sqlDeltaImport with Scala 2.11. - case "2.11" => "2.12" - case _ => scalaBinaryVersion - } -} - -lazy val sqlDeltaImport = (project in file("sql-delta-import")) - .settings ( - name := "sql-delta-import", - commonSettings, - skipReleaseSettings, - publishArtifact := scalaBinaryVersion.value != "2.11", - Test / publishArtifact := false, - libraryDependencies ++= Seq( - "io.netty" % "netty-buffer" % "4.1.63.Final" % "test", - "org.apache.spark" % ("spark-sql_" + sqlDeltaImportScalaVersion(scalaBinaryVersion.value)) % "3.2.0" % "provided", - "io.delta" % ("delta-core_" + sqlDeltaImportScalaVersion(scalaBinaryVersion.value)) % "1.1.0" % "provided", - "org.rogach" %% "scallop" % "3.5.1", - "org.scalatest" %% "scalatest" % scalaTestVersion % "test", - "com.h2database" % "h2" % "1.4.200" % "test", - "org.apache.spark" % ("spark-catalyst_" + sqlDeltaImportScalaVersion(scalaBinaryVersion.value)) % "3.2.0" % "test", - "org.apache.spark" % ("spark-core_" + sqlDeltaImportScalaVersion(scalaBinaryVersion.value)) % "3.2.0" % "test", - "org.apache.spark" % ("spark-sql_" + sqlDeltaImportScalaVersion(scalaBinaryVersion.value)) % "3.2.0" % "test" - ) - ) - -def flinkScalaVersion(scalaBinaryVersion: String): String = { - scalaBinaryVersion match { - // Flink doesn't support 2.13. We return 2.12 so that we can resolve the dependencies but we - // will not publish Flink connector with Scala 2.13. - case "2.13" => "2.12" - case _ => scalaBinaryVersion - } -} - -val flinkVersion = "1.16.1" -lazy val flink = (project in file("flink")) - .dependsOn(standaloneCosmetic % "provided") - .enablePlugins(GenJavadocPlugin, JavaUnidocPlugin) - .settings ( - name := "delta-flink", - commonSettings, - releaseSettings, - publishArtifact := scalaBinaryVersion.value == "2.12", // only publish once - autoScalaLibrary := false, // exclude scala-library from dependencies - Test / publishArtifact := false, - pomExtra := - https://github.com/delta-io/connectors - - git@github.com:delta-io/connectors.git - scm:git:git@github.com:delta-io/connectors.git - - - - pkubit-g - Paweł Kubit - https://github.com/pkubit-g - - - kristoffSC - Krzysztof Chmielewski - https://github.com/kristoffSC - - , - crossPaths := false, - libraryDependencies ++= Seq( - "org.apache.flink" % "flink-parquet" % flinkVersion % "provided", - "org.apache.flink" % "flink-table-common" % flinkVersion % "provided", - "org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided", - "org.apache.flink" % "flink-connector-files" % flinkVersion % "provided", - "org.apache.flink" % "flink-connector-files" % flinkVersion % "test" classifier "tests", - "org.apache.flink" % "flink-table-runtime" % flinkVersion % "provided", - "org.apache.flink" % "flink-scala_2.12" % flinkVersion % "provided", - "org.apache.flink" % "flink-runtime-web" % flinkVersion % "test", - "org.apache.flink" % "flink-connector-test-utils" % flinkVersion % "test", - "org.apache.flink" % "flink-clients" % flinkVersion % "test", - "org.apache.flink" % "flink-test-utils" % flinkVersion % "test", - "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "test" classifier "tests", - "org.mockito" % "mockito-inline" % "4.11.0" % "test", - "net.aichler" % "jupiter-interface" % JupiterKeys.jupiterVersion.value % Test, - "org.junit.vintage" % "junit-vintage-engine" % "5.8.2" % "test", - "org.mockito" % "mockito-junit-jupiter" % "4.11.0" % "test", - "org.junit.jupiter" % "junit-jupiter-params" % "5.8.2" % "test", - "io.github.artsok" % "rerunner-jupiter" % "2.1.6" % "test", - - // Exclusions due to conflicts with Flink's libraries from table planer, hive, calcite etc. - "org.apache.hive" % "hive-metastore" % "3.1.2" % "test" excludeAll( - ExclusionRule("org.apache.avro", "avro"), - ExclusionRule("org.slf4j", "slf4j-log4j12"), - ExclusionRule("org.pentaho"), - ExclusionRule("org.apache.hbase"), - ExclusionRule("org.apache.hbase"), - ExclusionRule("co.cask.tephra"), - ExclusionRule("com.google.code.findbugs", "jsr305"), - ExclusionRule("org.eclipse.jetty.aggregate", "module: 'jetty-all"), - ExclusionRule("org.eclipse.jetty.orbit", "javax.servlet"), - ExclusionRule("org.apache.parquet", "parquet-hadoop-bundle"), - ExclusionRule("com.tdunning", "json"), - ExclusionRule("javax.transaction", "transaction-api"), - ExclusionRule("'com.zaxxer", "HikariCP"), - ), - // Exclusions due to conflicts with Flink's libraries from table planer, hive, calcite etc. - "org.apache.hive" % "hive-exec" % "3.1.2" % "test" classifier "core" excludeAll( - ExclusionRule("'org.apache.avro", "avro"), - ExclusionRule("org.slf4j", "slf4j-log4j12"), - ExclusionRule("org.pentaho"), - ExclusionRule("com.google.code.findbugs", "jsr305"), - ExclusionRule("org.apache.calcite.avatica"), - ExclusionRule("org.apache.calcite"), - ExclusionRule("org.apache.hive", "hive-llap-tez"), - ExclusionRule("org.apache.logging.log4j"), - ExclusionRule("com.google.protobuf", "protobuf-java"), - ), - - // Compiler plugins - // -- Bump up the genjavadoc version explicitly to 0.18 to work with Scala 2.12 - compilerPlugin("com.typesafe.genjavadoc" %% "genjavadoc-plugin" % "0.18" cross CrossVersion.full) - ), - // generating source java class with version number to be passed during commit to the DeltaLog as engine info - // (part of transaction's metadata) - Compile / sourceGenerators += Def.task { - val file = (Compile / sourceManaged).value / "meta" / "Meta.java" - IO.write(file, - s"""package io.delta.flink.internal; - | - |final class Meta { - | public static final String FLINK_VERSION = "${flinkVersion}"; - | public static final String CONNECTOR_VERSION = "${version.value}"; - |} - |""".stripMargin) - Seq(file) - }, - /** - * Unidoc settings - * Generate javadoc with `unidoc` command, outputs to `flink/target/javaunidoc` - * e.g. build/sbt flink/unidoc - */ - JavaUnidoc / unidoc / javacOptions := Seq( - "-public", - "-windowtitle", "Flink/Delta Connector " + version.value.replaceAll("-SNAPSHOT", "") + " JavaDoc", - "-noqualifier", "java.lang", - "-tag", "implNote:a:Implementation Note:", - "-tag", "apiNote:a:API Note:", - "-Xdoclint:all" - ), - Compile / doc / javacOptions := (JavaUnidoc / unidoc / javacOptions).value, - JavaUnidoc / unidoc / unidocAllSources := { - (JavaUnidoc / unidoc / unidocAllSources).value - // include only relevant delta-flink classes - .map(_.filter(_.getCanonicalPath.contains("/flink/"))) - // exclude internal classes - .map(_.filterNot(_.getCanonicalPath.contains("/internal/"))) - // exclude flink package - .map(_.filterNot(_.getCanonicalPath.contains("org/apache/flink/"))) - }, - // Ensure unidoc is run with tests. Must be cleaned before test for unidoc to be generated. - (Test / test) := ((Test / test) dependsOn (Compile / unidoc)).value - ) diff --git a/connectors/build/sbt b/connectors/build/sbt deleted file mode 100755 index e2b247e35c8..00000000000 --- a/connectors/build/sbt +++ /dev/null @@ -1,183 +0,0 @@ -#!/usr/bin/env bash - -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# -# This file contains code from the Apache Spark project (original license above). -# It contains modifications, which are licensed as follows: -# - -# -# Copyright (2020-present) The Delta Lake Project Authors. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - - -# When creating new tests for Spark SQL Hive, the HADOOP_CLASSPATH must contain the hive jars so -# that we can run Hive to generate the golden answer. This is not required for normal development -# or testing. -if [ -n "$HIVE_HOME" ]; then - for i in "$HIVE_HOME"/lib/* - do HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$i" - done - export HADOOP_CLASSPATH -fi - -realpath () { -( - TARGET_FILE="$1" - - cd "$(dirname "$TARGET_FILE")" - TARGET_FILE="$(basename "$TARGET_FILE")" - - COUNT=0 - while [ -L "$TARGET_FILE" -a $COUNT -lt 100 ] - do - TARGET_FILE="$(readlink "$TARGET_FILE")" - cd $(dirname "$TARGET_FILE") - TARGET_FILE="$(basename $TARGET_FILE)" - COUNT=$(($COUNT + 1)) - done - - echo "$(pwd -P)/"$TARGET_FILE"" -) -} - -if [[ "$JENKINS_URL" != "" ]]; then - # Make Jenkins use Google Mirror first as Maven Central may ban us - SBT_REPOSITORIES_CONFIG="$(dirname "$(realpath "$0")")/sbt-config/repositories" - export SBT_OPTS="-Dsbt.override.build.repos=true -Dsbt.repository.config=$SBT_REPOSITORIES_CONFIG" -fi - -. "$(dirname "$(realpath "$0")")"/sbt-launch-lib.bash - - -declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=project/.boot -Dsbt.ivy.home=project/.ivy" -declare -r sbt_opts_file=".sbtopts" -declare -r etc_sbt_opts_file="/etc/sbt/sbtopts" - -usage() { - cat < path to global settings/plugins directory (default: ~/.sbt) - -sbt-boot path to shared boot directory (default: ~/.sbt/boot in 0.11 series) - -ivy path to local Ivy repository (default: ~/.ivy2) - -mem set memory options (default: $sbt_mem, which is $(get_mem_opts $sbt_mem)) - -no-share use all local caches; no sharing - -no-global uses global caches, but does not use global ~/.sbt directory. - -jvm-debug Turn on JVM debugging, open at the given port. - -batch Disable interactive mode - - # sbt version (default: from project/build.properties if present, else latest release) - -sbt-version use the specified version of sbt - -sbt-jar use the specified jar as the sbt launcher - -sbt-rc use an RC version of sbt - -sbt-snapshot use a snapshot version of sbt - - # java version (default: java from PATH, currently $(java -version 2>&1 | grep version)) - -java-home alternate JAVA_HOME - - # jvm options and output control - JAVA_OPTS environment variable, if unset uses "$java_opts" - SBT_OPTS environment variable, if unset uses "$default_sbt_opts" - .sbtopts if this file exists in the current directory, it is - prepended to the runner args - /etc/sbt/sbtopts if this file exists, it is prepended to the runner args - -Dkey=val pass -Dkey=val directly to the java runtime - -J-X pass option -X directly to the java runtime - (-J is stripped) - -S-X add -X to sbt's scalacOptions (-S is stripped) - -PmavenProfiles Enable a maven profile for the build. - -In the case of duplicated or conflicting options, the order above -shows precedence: JAVA_OPTS lowest, command line options highest. -EOM -} - -process_my_args () { - while [[ $# -gt 0 ]]; do - case "$1" in - -no-colors) addJava "-Dsbt.log.noformat=true" && shift ;; - -no-share) addJava "$noshare_opts" && shift ;; - -no-global) addJava "-Dsbt.global.base=$(pwd)/project/.sbtboot" && shift ;; - -sbt-boot) require_arg path "$1" "$2" && addJava "-Dsbt.boot.directory=$2" && shift 2 ;; - -sbt-dir) require_arg path "$1" "$2" && addJava "-Dsbt.global.base=$2" && shift 2 ;; - -debug-inc) addJava "-Dxsbt.inc.debug=true" && shift ;; - -batch) exec /dev/null) - if [[ ! $? ]]; then - saved_stty="" - fi -} - -saveSttySettings -trap onExit INT - -run "$@" - -exit_status=$? -onExit diff --git a/connectors/build/sbt-config/repositories b/connectors/build/sbt-config/repositories deleted file mode 100644 index dcac6f66c19..00000000000 --- a/connectors/build/sbt-config/repositories +++ /dev/null @@ -1,11 +0,0 @@ -[repositories] - local - local-preloaded-ivy: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/}, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext] - local-preloaded: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/} - gcs-maven-central-mirror: https://maven-central.storage-download.googleapis.com/repos/central/data/ - maven-central - typesafe-ivy-releases: https://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly - sbt-ivy-snapshots: https://repo.scala-sbt.org/scalasbt/ivy-snapshots/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly - sbt-plugin-releases: https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext] - bintray-spark-packages: https://dl.bintray.com/spark-packages/maven/ - typesafe-releases: http://repo.typesafe.com/typesafe/releases/ diff --git a/connectors/build/sbt-launch-lib.bash b/connectors/build/sbt-launch-lib.bash deleted file mode 100755 index 3d133f7e1cc..00000000000 --- a/connectors/build/sbt-launch-lib.bash +++ /dev/null @@ -1,197 +0,0 @@ -#!/usr/bin/env bash -# - -# A library to simplify using the SBT launcher from other packages. -# Note: This should be used by tools like giter8/conscript etc. - -# TODO - Should we merge the main SBT script with this library? - -if test -z "$HOME"; then - declare -r script_dir="$(dirname "$script_path")" -else - declare -r script_dir="$HOME/.sbt" -fi - -declare -a residual_args -declare -a java_args -declare -a scalac_args -declare -a sbt_commands -declare -a maven_profiles - -if test -x "$JAVA_HOME/bin/java"; then - echo -e "Using $JAVA_HOME as default JAVA_HOME." - echo "Note, this will be overridden by -java-home if it is set." - declare java_cmd="$JAVA_HOME/bin/java" -else - declare java_cmd=java -fi - -echoerr () { - echo 1>&2 "$@" -} -vlog () { - [[ $verbose || $debug ]] && echoerr "$@" -} -dlog () { - [[ $debug ]] && echoerr "$@" -} - -acquire_sbt_jar () { - SBT_VERSION=`awk -F "=" '/sbt\.version/ {print $2}' ./project/build.properties` - - # Download sbt from mirror URL if the environment variable is provided - if [[ "${SBT_VERSION}" == "0.13.18" ]] && [[ -n "${SBT_MIRROR_JAR_URL}" ]]; then - URL1="${SBT_MIRROR_JAR_URL}" - elif [[ "${SBT_VERSION}" == "1.5.5" ]] && [[ -n "${SBT_1_5_5_MIRROR_JAR_URL}" ]]; then - URL1="${SBT_1_5_5_MIRROR_JAR_URL}" - else - URL1=${DEFAULT_ARTIFACT_REPOSITORY:-https://repo1.maven.org/maven2/}org/scala-sbt/sbt-launch/${SBT_VERSION}/sbt-launch-${SBT_VERSION}.jar - fi - - JAR=build/sbt-launch-${SBT_VERSION}.jar - sbt_jar=$JAR - - if [[ ! -f "$sbt_jar" ]]; then - # Download sbt launch jar if it hasn't been downloaded yet - if [ ! -f "${JAR}" ]; then - # Download - printf 'Attempting to fetch sbt from %s\n' "${URL1}" - JAR_DL="${JAR}.part" - if [ $(command -v curl) ]; then - curl --fail --location --silent ${URL1} > "${JAR_DL}" &&\ - mv "${JAR_DL}" "${JAR}" - elif [ $(command -v wget) ]; then - wget --quiet ${URL1} -O "${JAR_DL}" &&\ - mv "${JAR_DL}" "${JAR}" - else - printf "You do not have curl or wget installed, please install sbt manually from https://www.scala-sbt.org/\n" - exit -1 - fi - fi - if [ ! -f "${JAR}" ]; then - # We failed to download - printf "Our attempt to download sbt locally to ${JAR} failed. Please install sbt manually from https://www.scala-sbt.org/\n" - exit -1 - fi - printf "Launching sbt from ${JAR}\n" - fi -} - -execRunner () { - # print the arguments one to a line, quoting any containing spaces - [[ $verbose || $debug ]] && echo "# Executing command line:" && { - for arg; do - if printf "%s\n" "$arg" | grep -q ' '; then - printf "\"%s\"\n" "$arg" - else - printf "%s\n" "$arg" - fi - done - echo "" - } - - "$@" -} - -addJava () { - dlog "[addJava] arg = '$1'" - java_args=( "${java_args[@]}" "$1" ) -} - -enableProfile () { - dlog "[enableProfile] arg = '$1'" - maven_profiles=( "${maven_profiles[@]}" "$1" ) - export SBT_MAVEN_PROFILES="${maven_profiles[@]}" -} - -addSbt () { - dlog "[addSbt] arg = '$1'" - sbt_commands=( "${sbt_commands[@]}" "$1" ) -} -addResidual () { - dlog "[residual] arg = '$1'" - residual_args=( "${residual_args[@]}" "$1" ) -} -addDebugger () { - addJava "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=$1" -} - -# a ham-fisted attempt to move some memory settings in concert -# so they need not be dicked around with individually. -get_mem_opts () { - local mem=${1:-2048} - local perm=$(( $mem / 4 )) - (( $perm > 256 )) || perm=256 - (( $perm < 4096 )) || perm=4096 - local codecache=$(( $perm / 2 )) - - echo "-Xms${mem}m -Xmx${mem}m -XX:ReservedCodeCacheSize=${codecache}m" -} - -require_arg () { - local type="$1" - local opt="$2" - local arg="$3" - if [[ -z "$arg" ]] || [[ "${arg:0:1}" == "-" ]]; then - echo "$opt requires <$type> argument" 1>&2 - exit 1 - fi -} - -is_function_defined() { - declare -f "$1" > /dev/null -} - -process_args () { - while [[ $# -gt 0 ]]; do - case "$1" in - -h|-help) usage; exit 1 ;; - -v|-verbose) verbose=1 && shift ;; - -d|-debug) debug=1 && shift ;; - - -ivy) require_arg path "$1" "$2" && addJava "-Dsbt.ivy.home=$2" && shift 2 ;; - -mem) require_arg integer "$1" "$2" && sbt_mem="$2" && shift 2 ;; - -jvm-debug) require_arg port "$1" "$2" && addDebugger $2 && shift 2 ;; - -batch) exec 0.3.0 - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.DeltaLog.getChanges"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.DeltaLog.startTransaction"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.Snapshot.scan"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.DeltaLog.tableExists"), - - // Switch to using delta-storage LogStore API in 0.4.0 -> 0.5.0 - ProblemFilters.exclude[MissingClassProblem]("io.delta.standalone.storage.LogStore"), - - // Ignore missing shaded attributes - ProblemFilters.exclude[Problem]("shadedelta.*"), - - // Public API changes in 0.4.0 -> 0.5.0 - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.DeltaLog.getVersionBeforeOrAtTimestamp"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.DeltaLog.getVersionAtOrAfterTimestamp"), - - // ParquetSchemaConverter etc. were moved to project standalone-parquet - ProblemFilters.exclude[MissingClassProblem]("io.delta.standalone.util.ParquetSchemaConverter"), - ProblemFilters.exclude[MissingClassProblem]("io.delta.standalone.util.ParquetSchemaConverter$ParquetOutputTimestampType"), - - // Public API changes in 0.5.0 -> 0.6.0 - ProblemFilters.exclude[ReversedMissingMethodProblem]("io.delta.standalone.OptimisticTransaction.readVersion"), - - // scalastyle:on line.size.limit - ) -} diff --git a/connectors/project/build.properties b/connectors/project/build.properties deleted file mode 100644 index 3b06b0f4f51..00000000000 --- a/connectors/project/build.properties +++ /dev/null @@ -1,36 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# -# This file contains code from the Apache Spark project (original license above). -# It contains modifications, which are licensed as follows: -# - -# -# Copyright (2020-present) The Delta Lake Project Authors. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -sbt.version=1.6.1 diff --git a/connectors/project/plugins.sbt b/connectors/project/plugins.sbt deleted file mode 100644 index 0935860633e..00000000000 --- a/connectors/project/plugins.sbt +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Copyright (2020-present) The Delta Lake Project Authors. - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -addSbtPlugin("com.github.sbt" % "sbt-release" % "1.0.15") - -addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0") - -addSbtPlugin("io.get-coursier" % "sbt-coursier" % "1.0.3") - -addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "1.0.0") - -addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.10.0-RC1") - -addSbtPlugin("com.eed3si9n" % "sbt-unidoc" % "0.4.2") - -addSbtPlugin("com.github.sbt" % "sbt-pgp" % "2.1.2") - -addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "3.9.15") - -addSbtPlugin("com.typesafe" % "sbt-mima-plugin" % "1.0.1") - -addSbtPlugin("com.etsy" % "sbt-checkstyle-plugin" % "3.1.1") - -addSbtPlugin("net.aichler" % "sbt-jupiter-interface" % "0.9.1") - -// By default, sbt-checkstyle-plugin uses checkstyle version 6.15, but we should set it to use the -// same version as Spark OSS (8.29) -dependencyOverrides += "com.puppycrawl.tools" % "checkstyle" % "8.29" diff --git a/connectors/version.sbt b/connectors/version.sbt deleted file mode 100644 index 853c0109790..00000000000 --- a/connectors/version.sbt +++ /dev/null @@ -1 +0,0 @@ -ThisBuild / version := "0.6.1-SNAPSHOT"