Discussions
Engage in dynamic conversations covering diverse topics within the Databricks Community. Explore discussions on data engineering, machine learning, and more. Join the conversation and expand your knowledge base with insights from experts and peers.
Showing results for 
Search instead for 
Did you mean: 

Browse the Community

Community Discussions

Engage in vibrant discussions covering diverse learning topics within the Databricks Community. Expl...

4192 Posts

Activity in Discussions

Rajeshwar_Reddy
by > New Contributor II
  • 2011 Views
  • 4 replies
  • 0 kudos

ODBC connection issue Simba 64 bit

Hello AllAm getting the below error when trying to create ODBC DSN Simba 64 in local system to connect Databricks Server using the token and enabled SSL System trust store & Thrift Transport: HTTP.any helping hand really appreciated . [Simba][ThriftE...

  • 2011 Views
  • 4 replies
  • 0 kudos
Latest Reply
DougJames
New Contributor II
  • 0 kudos

Solved for my case. Still not sure why/how it was working on one server but not the other.Final fix was to add HTTPPath value to the Connection String I listed above.

  • 0 kudos
3 More Replies
Daniela_Boamba
by > New Contributor III
  • 4 Views
  • 0 replies
  • 0 kudos

Databricks certificate expired

Hello,I have a databricks workspace with sso authentication. the IDP is on azure.The client certificate expired and now, I can't log on to databricks to add the new one.How can I do? Any idea is welcomed.Thank you!!Best regards,daniela 

  • 4 Views
  • 0 replies
  • 0 kudos
Mous92i
by > New Contributor
  • 121 Views
  • 3 replies
  • 1 kudos

Resolved! Liquid Clustering With Merge

Hello I’m facing severe performance issues with a  merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

  • 121 Views
  • 3 replies
  • 1 kudos
Latest Reply
Mous92i
New Contributor
  • 1 kudos

Thanks for your response

  • 1 kudos
2 More Replies
viniciuscini
by > New Contributor
  • 5198 Views
  • 2 replies
  • 0 kudos

Improve query performance of direct query with Databricks

I’m building a dashboard in Power BI’s Pro Workspace, connecting data via Direct Query from Databricks (around 60 million rows from 15 combined tables), using a SQL Serverless (small size and 4 clusters).The problem is that the dashboard is taking to...

  • 5198 Views
  • 2 replies
  • 0 kudos
Latest Reply
ArekKemp
Visitor
  • 0 kudos

@viniciuscini have you managed to get it working well for you?

  • 0 kudos
1 More Replies
databricksero
by > New Contributor
  • 231 Views
  • 8 replies
  • 3 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

  • 231 Views
  • 8 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 3 kudos

@databricksero  Explicit Schema Definition: When calling spark.createDataFrame(pdf_cleaned), explicitly provide the schema even if the DataFrame is empty. This helps Spark infer the types and prevents the “cannot infer schema from empty dataset” erro...

  • 3 kudos
7 More Replies
Rezakorehi
by > New Contributor II
  • 400 Views
  • 7 replies
  • 11 kudos

Unity catalogues - What would you do

If you were creating Unity Catalogs again, what would you do differently based on your past experience?

  • 400 Views
  • 7 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

@nayan_wylde no don't do that hehe. It was example of extreme approach. Usually use catalog to separate environment + in enterprises to separate divisions like customer tower, marketing tower, finance tower etc

  • 11 kudos
6 More Replies
sjujjuru
by > Visitor
  • 56 Views
  • 2 replies
  • 1 kudos

REGARD COPOUN CODE

After completing all the relevant courses for the certification, I haven’t received the coupon code yet.

  • 56 Views
  • 2 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Wait till 15th November if you still have no voucher open support ticket,

  • 1 kudos
1 More Replies
YuriS
by > New Contributor II
  • 280 Views
  • 3 replies
  • 1 kudos

Resolved! How to reduce data loss for Delta Lake on Azure when failing from primary to secondary regions?

Let’s say we have big data application where data loss is not an option.Having GZRS (geo-zone-redundant storage) redundancy we would achieve zero data loss if primary region is alive – writer is waiting for acks from two or more Azure availability zo...

  • 280 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Databricks is working on improvements and new functionality related to that. For now, the only solution is a DEEP CLONE. You can run it more frequently or implement your own replication based on a change data feed. You could use delta sharing for tha...

  • 1 kudos
2 More Replies
deng_dev
by > New Contributor III
  • 11238 Views
  • 1 replies
  • 0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

  • 11238 Views
  • 1 replies
  • 0 kudos
Latest Reply
sahilchavan
New Contributor II
  • 0 kudos

Hi @deng_dev ,Did you discover any way to raise this error gracefully? I'm facing the same error when running the kinesis stream. Although I'm aware of what the error is but my intent is to raise and log the error gracefully 

  • 0 kudos
Brahmareddy
by > Esteemed Contributor
  • 112 Views
  • 1 replies
  • 4 kudos

I Tried Teaching Databricks About Itself — Here’s What Happened

Hi All, How are you doing today?I wanted to share something interesting from my recent Databricks work — I’ve been playing around with an idea I call “Real-Time Metadata Intelligence.” Most of us focus on optimizing data pipelines, query performance,...

  • 112 Views
  • 1 replies
  • 4 kudos
Latest Reply
ruicarvalho_de
New Contributor II
  • 4 kudos

I like the core idea. You are mining signals the platform already emits.I would start rules first, track small files ratio and average file size trend, watch skew per partition and shuffle bytes per input gigabyte. Compare job time to input size to c...

  • 4 kudos
Bhavana_Y
by > New Contributor
  • 42 Views
  • 1 replies
  • 1 kudos

Learning Path for Spark Developer Associate

Hello Everyone,Happy for being a part of Virtual Journey !!Enrolled in Associate Spark Developer and completed learning path in Databricks Academy. Can anyone please confirm is completing learning path enough for obtaining 50% off voucher for certifi...

Screenshot (15).png
  • 42 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @Bhavana_Y! To be eligible for the incentives, you’ll need to complete one of the pathways mentioned in the Learning Festival post. Based on your screenshot, it looks like you’ve completed all four modules of LEARNING PATHWAY 7: APACHE SPARK DE...

  • 1 kudos
YugandharG
by > New Contributor
  • 48 Views
  • 1 replies
  • 0 kudos

Lakebase storage location

Hi,I'm a Solution Architect from a reputed insurance company looking for few key technical information about Lakebase architecture. Being fully managed serverless OLTP offering from Databricks, there is no clear documentation that talks about data st...

  • 48 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @YugandharG ,1. Lakebase data is stored in databricks-managed cloud object storage. There's no option to use customer storage as of now.2. File format: vanilla postgres pages. The storage format of postgres has nothing to do with parquet/delta. Wa...

  • 0 kudos
donlxz
by > New Contributor III
  • 122 Views
  • 4 replies
  • 3 kudos

Resolved! deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

  • 122 Views
  • 4 replies
  • 3 kudos
Latest Reply
donlxz
New Contributor III
  • 3 kudos

I’ll try making adjustments on the Informatica side.Thank you for your help.

  • 3 kudos
3 More Replies
Jonathan_
by > New Contributor II
  • 151 Views
  • 4 replies
  • 6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

  • 151 Views
  • 4 replies
  • 6 kudos
Latest Reply
tarunnagar
New Contributor
  • 6 kudos

This is a pretty common issue with PySpark when working on large DAGs with lots of joins and transformations. As the DAG grows, Spark has to maintain a huge execution plan, and performance can drop due to shuffling, serialization, and memory overhead...

  • 6 kudos
3 More Replies
mikvaar
by > New Contributor III
  • 586 Views
  • 8 replies
  • 5 kudos

DAB + DLT destroy fails due to ownership/permissions mismatch

Hi all,We are running into an issue with Databricks Asset Bundles (DAB) when trying to destroy a DLT pipeline. Setup is as follows:Two separate service principals:Deployment SP: used by Azure DevOps for deploying bundles.Run_as SP: used for running t...

Data Engineering
Databricks
Databricks Asset Bundles
DevOps
  • 586 Views
  • 8 replies
  • 5 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 5 kudos

We just released https://github.com/databricks/cli/releases/tag/v0.273.0 with a mitigation for this, the error should disappear if you upgrade. Please try and let us know how it goes. Terraform fix is in https://github.com/databricks/terraform-provid...

  • 5 kudos
7 More Replies