Skip to content

Conversation

@midavadim
Copy link
Contributor

@midavadim midavadim commented Jan 6, 2022

What

If dataset_if started with a number like '1id' then normalization transformed it to '_1id'

REAL Example:

select
id,
name,
type,
enabled,
updatedAt,
insertedAt,
personaTagIds,
contactListIds,
_airbyte_emitted_at,
_airbyte_workflows_hashid
from __dbt__CTE__workflows_ab3
-- workflows from dataline-integration-testing._111testing_raw._airbyte_raw_workflows

How

Removed conditions in
airbyte-integrations/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py

Manually tested that Bigquery allows the creation of dataset which starts with a number,
Table name also can start with a number

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

No

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Jan 6, 2022

CLA assistant check
All committers have signed the CLA.

@midavadim
Copy link
Contributor Author

midavadim commented Jan 6, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1663843102
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1663843102
🐛

@jrhizor jrhizor temporarily deployed to more-secrets January 6, 2022 16:21 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Jan 11, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1682332167
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1682332167
🐛

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 11, 2022 12:02 Inactive
Copy link
Contributor

@bazarnov bazarnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other normalisation tests are not passing. Is it related to this particular change you're making?
https://github.com/airbytehq/airbyte/runs/4774907109?check_suite_focus=true#step:10:20913

@midavadim midavadim temporarily deployed to more-secrets January 18, 2022 07:22 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Jan 18, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1711507998
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1711507998
🐛

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 18, 2022 07:26 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Jan 18, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1711774383
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1711774383
🐛

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 18, 2022 08:43 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Jan 18, 2022

  1. current update is related to bigquery only

  2. There is 1 failing test related to redshift test_nested_streams, it happens on the mater branch as well:

E AssertionError: docker run --rm --init -v /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/redshift/test_nested_streams:/workspace -v /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/redshift/test_nested_streams/build:/build -v /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/redshift/test_nested_streams/logs:/logs -v /tmp:/tmp --network host --entrypoint /usr/local/bin/dbt -i airbyte/normalization:dev run --profiles-dir=/workspace --project-dir=/workspace --full-refresh
E terminated with return code 1 with 1 'Error/Warning/Fail' mention(s).
E assert 1 == 0
E +1
E -0

/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py:465: AssertionError

  1. all local tests for bigquery are passed:
    NORMALIZATION_TEST_TARGET=bigquery ./gradlew :airbyte-integrations:bases:base-normalization:integrationTest
@midavadim
Copy link
Contributor Author

Some other normalisation tests are not passing. Is it related to this particular change you're making? https://github.com/airbytehq/airbyte/runs/4774907109?check_suite_focus=true#step:10:20913

replied above

@midavadim midavadim requested a review from bazarnov January 18, 2022 17:38
Copy link
Contributor

@antixar antixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is minor changes but tests were failed. I see their failures are related with another reason but we need approval of Airbyte team for publishing of your changes without tests.

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Jan 21, 2022

The official docs from Google BigQuery says:

Unquoted identifiers must begin with a letter or an underscore character. Subsequent characters can be letters, numbers, or underscores
These are invalid identifiers:
5Customers

https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical

That's why it was implemented to insert an extra underscore

Why do we need to diverge from official recommendations? or do we need to adapt the logic for dataset and not table/column names?

@ChristopheDuong
Copy link
Contributor

Some other normalization tests are not passing. Is it related to this particular change you're making?

The normalization tests for redshift are failing indeed.
I understand it is not related to this PR that is touching only BigQuery, but all tests should pass in order to merge...

Can we look into why redshift is failing too?

Was there any changes related to json/super types in redshift?

Completed with 1 error and 0 warnings: Database Error in model simple_stream_with_namespace_resulting_into_long_names (models/generated/airbyte_incremental/test_normalization_namespace/simple_stream_with_namespace_resulting_into_long_names.sql) function json_extract_path_text(super, "unknown", boolean) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts. compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/test_normalization_namespace/simple_stream_with_namespace_resulting_into_long_names.sql 
@midavadim
Copy link
Contributor Author

midavadim commented Feb 2, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1785880461
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1785880461
🐛

@midavadim midavadim temporarily deployed to more-secrets February 2, 2022 19:52 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 2, 2022 19:53 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Feb 3, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1790003459
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1790003459
🐛

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 3, 2022 14:38 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Feb 3, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1790085928
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1790085928
Python tests coverage:

 Name Stmts Miss Cover --------------------------------------------------------------------------------------------------------------------------------------------------- /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py 2 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py 1 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py 124 0 100% base_python/__init__.py 13 0 100% base_python/catalog_helpers.py 10 6 40% base_python/cdk/__init__.py 0 0 100% base_python/cdk/abstract_source.py 89 64 28% base_python/cdk/streams/__init__.py 0 0 100% base_python/cdk/streams/auth/__init__.py 0 0 100% base_python/cdk/streams/auth/core.py 8 1 88% base_python/cdk/streams/auth/oauth.py 37 26 30% base_python/cdk/streams/auth/token.py 9 4 56% base_python/cdk/streams/core.py 63 32 49% base_python/cdk/streams/exceptions.py 10 2 80% base_python/cdk/streams/http.py 67 33 51% base_python/cdk/streams/rate_limiting.py 30 14 53% base_python/cdk/utils/__init__.py 0 0 100% base_python/cdk/utils/casing.py 4 0 100% base_python/cdk/utils/event_timing.py 47 3 94% base_python/client.py 56 33 41% base_python/entrypoint.py 70 56 20% base_python/integration.py 52 25 52% base_python/logger.py 33 15 55% base_python/schema_helpers.py 56 41 27% base_python/source.py 51 34 33% --------------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 832 389 53% Name Stmts Miss Cover --------------------------------------------------------------------------------------------------------------------------------------------------- /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py 2 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py 1 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py 124 0 100% normalization/__init__.py 4 0 100% normalization/destination_type.py 13 0 100% normalization/transform_catalog/__init__.py 2 0 100% normalization/transform_catalog/catalog_processor.py 143 77 46% normalization/transform_catalog/destination_name_transformer.py 155 9 94% normalization/transform_catalog/reserved_keywords.py 13 0 100% normalization/transform_catalog/stream_processor.py 520 333 36% normalization/transform_catalog/table_name_registry.py 174 34 80% normalization/transform_catalog/transform.py 45 26 42% normalization/transform_catalog/utils.py 33 7 79% normalization/transform_config/__init__.py 2 0 100% normalization/transform_config/transform.py 148 34 77% --------------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 1379 520 62% Name Stmts Miss Cover ------------------------------------------------------------------------ source_acceptance_test/__init__.py 2 0 100% source_acceptance_test/base.py 10 4 60% source_acceptance_test/config.py 74 6 92% source_acceptance_test/tests/__init__.py 4 0 100% source_acceptance_test/tests/test_core.py 275 106 61% source_acceptance_test/tests/test_full_refresh.py 52 2 96% source_acceptance_test/tests/test_incremental.py 69 38 45% source_acceptance_test/utils/__init__.py 6 0 100% source_acceptance_test/utils/asserts.py 37 2 95% source_acceptance_test/utils/common.py 70 17 76% source_acceptance_test/utils/compare.py 62 23 63% source_acceptance_test/utils/connector_runner.py 110 48 56% source_acceptance_test/utils/json_schema_helper.py 105 13 88% ------------------------------------------------------------------------ TOTAL 876 259 70% Name Stmts Miss Cover --------------------------------------------------------------------------------------------------------------------------------------------------- /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py 2 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py 1 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py 124 0 100% normalization/__init__.py 4 0 100% normalization/destination_type.py 13 0 100% normalization/transform_catalog/__init__.py 2 0 100% normalization/transform_catalog/catalog_processor.py 143 77 46% normalization/transform_catalog/destination_name_transformer.py 155 9 94% normalization/transform_catalog/reserved_keywords.py 13 0 100% normalization/transform_catalog/stream_processor.py 520 333 36% normalization/transform_catalog/table_name_registry.py 174 34 80% normalization/transform_catalog/transform.py 45 26 42% normalization/transform_catalog/utils.py 33 7 79% normalization/transform_config/__init__.py 2 0 100% normalization/transform_config/transform.py 148 34 77% --------------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 1379 520 62% Name Stmts Miss Cover --------------------------------------------------------------------------------------------------------------------------------------------------- /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py 2 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py 1 0 100% /actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py 124 0 100% normalization/__init__.py 4 0 100% normalization/destination_type.py 13 0 100% normalization/transform_catalog/__init__.py 2 0 100% normalization/transform_catalog/catalog_processor.py 143 12 92% normalization/transform_catalog/destination_name_transformer.py 155 5 97% normalization/transform_catalog/reserved_keywords.py 13 0 100% normalization/transform_catalog/stream_processor.py 520 39 92% normalization/transform_catalog/table_name_registry.py 174 51 71% normalization/transform_catalog/transform.py 45 30 33% normalization/transform_catalog/utils.py 33 0 100% normalization/transform_config/__init__.py 2 0 100% normalization/transform_config/transform.py 148 46 69% --------------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 1379 183 87% 
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 3, 2022 14:56 Inactive
@midavadim
Copy link
Contributor Author

Some other normalization tests are not passing. Is it related to this particular change you're making?

The normalization tests for redshift are failing indeed. I understand it is not related to this PR that is touching only BigQuery, but all tests should pass in order to merge...

Can we look into why redshift is failing too?

Was there any changes related to json/super types in redshift?

Completed with 1 error and 0 warnings: Database Error in model simple_stream_with_namespace_resulting_into_long_names (models/generated/airbyte_incremental/test_normalization_namespace/simple_stream_with_namespace_resulting_into_long_names.sql) function json_extract_path_text(super, "unknown", boolean) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts. compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/test_normalization_namespace/simple_stream_with_namespace_resulting_into_long_names.sql 

This problem was caused by overlapping tests from different branches.
Tests from another branch created tables with SUPER type, so that's why my tests failed.

Once we re-created redshift database my tests started working.

Can not start with number: column (added _ before name)
@midavadim midavadim temporarily deployed to more-secrets February 3, 2022 19:48 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Feb 3, 2022

The official docs from Google BigQuery says:

Unquoted identifiers must begin with a letter or an underscore character. Subsequent characters can be letters, numbers, or underscores
These are invalid identifiers:
5Customers

https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical

That's why it was implemented to insert an extra underscore

Why do we need to diverge from official recommendations? or do we need to adapt the logic for dataset and not table/column names?

I have tested different names on Bigquery:

Can not start with number: column (GCP also automatically adds _ to column names which starts with number during CSV upload)
Can start with number: datasetId, table (and they do not require quotes), both queries are valid:
SELECT name1, _2name, name_3 FROM `dataline-integration-testing.1.1tt` LIMIT 1000
SELECT name1, _2name, name_3 FROM dataline-integration-testing.1.1tt LIMIT 1000

@ChristopheDuong I tested only datasetId, table, columns ... is there another identifier that we should consider for big query?

Copy link
Contributor

@ChristopheDuong ChristopheDuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find thanks

@midavadim midavadim changed the title use unchanged dataset_id if it starts with a number 🐛 normalization for bigquery: allow datasetId and table to start with number Feb 5, 2022
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Feb 7, 2022
@midavadim midavadim temporarily deployed to more-secrets February 7, 2022 07:46 Inactive
@midavadim
Copy link
Contributor Author

midavadim commented Feb 7, 2022

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1805180085
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1805180085

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 7, 2022 07:49 Inactive
@github-actions github-actions bot added area/platform issues related to the platform area/worker Related to worker labels Feb 7, 2022
@midavadim midavadim temporarily deployed to more-secrets February 7, 2022 09:00 Inactive
@midavadim midavadim merged commit b447bb5 into master Feb 7, 2022
@midavadim midavadim deleted the midavadim/6732-destination-bigquery-normalization branch February 7, 2022 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker normalization

8 participants