Skip to main content

1.18.0 - 2024-07-11

Added

  • Spark: configurable integration test #2755 @pawel-big-lebowski
    Provides command line tool capable of running Spark integration tests that can be created without Java.
  • Spark: OpenLineage Spark extension interfaces without runtime dependency hell #2809 #2837 @ddebowczyk92
    New Spark extension interfaces without runtime dependency hell. Includes a test to verify the integration is working properly.
  • Spark: support latest versions 3.4.3 and 3.5.1. #2743 @pawel-big-lebowski
    Upgrades CI workflows to run tests against latest Spark versions: 3.4.2 -> 3.4.3 and 3.5.0 -> 3.5.1.
  • Spark: add extraction of the masking property in column-level lineage #2789 @tnazarew
    Adds extraction of the masking property during collection of dependencies for ColumnLineageDatasetFacet creation.
  • Spark: collect table name from InsertIntoHadoopFsRelationCommand #2794 @dolfinus
    Collects a table name for INSERT INTO command for tables created with USING $fileFormat syntax, like USING orc.
  • Spark, Flink: add PostgresJdbcExtractor #2806 @dolfinus
    Adds the default 5432 port to Postgres namespaces.
  • Spark, Flink: add TeradataJdbcExtractor #2826 @dolfinus
    Converts JDBC URLs like jdbc:teradata/host/DBS_PORT=1024,DATABASE=somedb to datasets with namespace teradata://host:1024 and name somedb.table.
  • Spark, Flink: add MySqlJdbcExtractor #2825 @dolfinus
    Handles different formats of MySQL JDBC URL, and produces datasets with consistent namespaces, like mysql://host:port.
  • Spark, Flink: add OracleJdbcExtractor #2824 @dolfinus
    Handles simple Oracle JDBC URLs, like oracle:thin:@//host:port/serviceName and oracle:thin@host:port:sid, and converts each to a dataset with namespace oracle://host:port and name sid.schema.table or serviceName.schema.table.
  • Spark: configurable test with Docker image provided #2822 @pawel-big-lebowski
    Extends the configurable integration test feature to enable getting the Docker image name as a name.
  • Spark: Support Iceberg 1.4 on Spark 3.5.1. #2838 @pawel-big-lebowski
    Include Iceberg support for Spark 3.5. Fix column level lineage facet for UNION queries.
  • Spec: add example for change in #2756 #2801 @Sheeri
    Updates the customLineage facet test for the new syntax created in #2756.

Changed

  • Spark: fallback to spark.sql.warehouse.dir as table namespace #2767 @dolfinus
    In cases when a metastore is not used, falls back to spark.sql.warehouse.dir or hive.metastore.warehouse.dir as table namespace, instead of duplicating the table's location.

Fixed

  • Java: handle dashes in hostname for JdbcExtractors #2830 @dolfinus
    Proper handling of dashes in JDBC URL hosts.
  • Spark: fix Glue symlinks formatting bug #2807 @Akash2351
    Fixes Glue symlinks with config parsing for Glue catalogid.
  • Spark, Flink: fix DBFS namespace format #2800 @dolfinus
    Fixes the DBFS namespace format.
  • Spark: fix Glue naming format #2766 @dolfinus
    Changes the AWS Glue namespace to match Glue ARN documentation.
  • Spark: fix Iceberg dataset location #2797 @dolfinus
    Fixes Iceberg dataset namespace: instead of file:/some/path/database.table uses file:/some/path/database/table. For dataset TABLE symlink, uses warehouse location instead of database location.
  • Spark: fix NPE and incorrect comment #2827 @pawel-big-lebowski
    Fixes an error caused by a recent upgrade of Spark versions that did not break existing tests.
  • Spark: convert scheme and authority to lowercase in JdbcLocation #2831 @dolfinus
    Converts valid JDBC URL scheme and authority to lowercase, leaving intact instance/database name, as different databases have different default case and case-sensitivity rules.