- Notifications
You must be signed in to change notification settings - Fork 135
Labels
api: spannerIssues related to the googleapis/java-spanner API.Issues related to the googleapis/java-spanner API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.
Following are the details.
Environment details
- Spanner
3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned - OS type and version: MacOs version 10.15.3
- Java version: java 8
- spanner version(s): the latest version
Steps to reproduce
- Write 2 billion records with timestamp in to the spanner table with dataflow
- Run a clean job to delete the old 1 billion records with Partitioned DML by java client
- Set the timeout to be 24 hours by using setPartitionedDmlTimeout(Duration.ofHours(24L))
- Run the java code
Code example
SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build(); Spanner spanner = options.getService(); try { DatabaseId db = DatabaseId.of(options.getProjectId(), args[0], args[1]); // [END init_client] //This will return the default project id based on the environment. String clientProject = spanner.getOptions().getProjectId(); if (!db.getInstanceId().getProject().equals(clientProject)) { System.err.println( "Invalid project specified. Project in the database id should match the" + "project name set in the environment variable GOOGLE_CLOUD_PROJECT. Expected: " + clientProject); } // [START init_client] DatabaseClient dbClient = spanner.getDatabaseClient(db); SimpleDateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS"); String sql="DELETE FROM MyTable WHERE updateTimestamp<'2020-05-02T03:00:00Z'"; Date currentDate = new Date(); System.out.println(String.format("Delete Start:%s",formatter.format(currentDate))); long lStartTime = Instant.now().toEpochMilli(); deleteUsingPartitionedDml(dbClient,sql); long lEndTime = Instant.now().toEpochMilli(); long elaspedTime = lEndTime - lStartTime; System.out.println("Elapsed time in : " + elaspedTime/1000); } finally { spanner.close(); } // [END init_client] System.out.println("Closed client"); } private static void deleteUsingPartitionedDml(DatabaseClient dbClient,String sql) { long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql)); System.out.printf("%d records deleted.\n", rowCount); }
Metadata
Metadata
Assignees
Labels
api: spannerIssues related to the googleapis/java-spanner API.Issues related to the googleapis/java-spanner API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.