Skip to content

Commit a218038

Browse files
authored
HADOOP-17139 Re-enable optimized copyFromLocal implementation in S3AFileSystem (#3101)
This work * Defines the behavior of FileSystem.copyFromLocal in filesystem.md * Implements a high performance implementation of copyFromLocalOperation for S3 * Adds a contract test for the operation: AbstractContractCopyFromLocalTest * Implements the contract tests for Local and S3A FileSystems Contributed by: Bogdan Stolojan
1 parent 6d77f3b commit a218038

File tree

7 files changed

+1171
-180
lines changed

7 files changed

+1171
-180
lines changed

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,9 @@ private static Path checkDest(String srcName, FileSystem dstFS, Path dst,
524524
if (null != sdst) {
525525
if (sdst.isDirectory()) {
526526
if (null == srcName) {
527+
if (overwrite) {
528+
return dst;
529+
}
527530
throw new PathIsDirectoryException(dst.toString());
528531
}
529532
return checkDest(null, dstFS, new Path(dst, srcName), overwrite);

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1419,6 +1419,112 @@ operations related to the part of the file being truncated is undefined.
14191419

14201420

14211421

1422+
### `boolean copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst)`
1423+
1424+
The source file or directory at `src` is on the local disk and is copied into the file system at
1425+
destination `dst`. If the source must be deleted after the move then `delSrc` flag must be
1426+
set to TRUE. If destination already exists, and the destination contents must be overwritten
1427+
then `overwrite` flag must be set to TRUE.
1428+
1429+
#### Preconditions
1430+
Source and destination must be different
1431+
```python
1432+
if src = dest : raise FileExistsException
1433+
```
1434+
1435+
Destination and source must not be descendants one another
1436+
```python
1437+
if isDescendant(src, dest) or isDescendant(dest, src) : raise IOException
1438+
```
1439+
1440+
The source file or directory must exist locally:
1441+
```python
1442+
if not exists(LocalFS, src) : raise FileNotFoundException
1443+
```
1444+
1445+
Directories cannot be copied into files regardless to what the overwrite flag is set to:
1446+
1447+
```python
1448+
if isDir(LocalFS, src) and isFile(FS, dst) : raise PathExistsException
1449+
```
1450+
1451+
For all cases, except the one for which the above precondition throws, the overwrite flag must be
1452+
set to TRUE for the operation to succeed if destination exists. This will also overwrite any files
1453+
/ directories at the destination:
1454+
1455+
```python
1456+
if exists(FS, dst) and not overwrite : raise PathExistsException
1457+
```
1458+
1459+
#### Determining the final name of the copy
1460+
Given a base path on the source `base` and a child path `child` where `base` is in
1461+
`ancestors(child) + child`:
1462+
1463+
```python
1464+
def final_name(base, child, dest):
1465+
is base = child:
1466+
return dest
1467+
else:
1468+
return dest + childElements(base, child)
1469+
```
1470+
1471+
#### Outcome where source is a file `isFile(LocalFS, src)`
1472+
For a file, data at destination becomes that of the source. All ancestors are directories.
1473+
```python
1474+
if isFile(LocalFS, src) and (not exists(FS, dest) or (exists(FS, dest) and overwrite)):
1475+
FS' = FS where:
1476+
FS'.Files[dest] = LocalFS.Files[src]
1477+
FS'.Directories = FS.Directories + ancestors(FS, dest)
1478+
LocalFS' = LocalFS where
1479+
not delSrc or (delSrc = true and delete(LocalFS, src, false))
1480+
else if isFile(LocalFS, src) and isDir(FS, dest):
1481+
FS' = FS where:
1482+
let d = final_name(src, dest)
1483+
FS'.Files[d] = LocalFS.Files[src]
1484+
LocalFS' = LocalFS where:
1485+
not delSrc or (delSrc = true and delete(LocalFS, src, false))
1486+
```
1487+
There are no expectations that the file changes are atomic for both local `LocalFS` and remote `FS`.
1488+
1489+
#### Outcome where source is a directory `isDir(LocalFS, src)`
1490+
```python
1491+
if isDir(LocalFS, src) and (isFile(FS, dest) or isFile(FS, dest + childElements(src))):
1492+
raise FileAlreadyExistsException
1493+
else if isDir(LocalFS, src):
1494+
if exists(FS, dest):
1495+
dest' = dest + childElements(src)
1496+
if exists(FS, dest') and not overwrite:
1497+
raise PathExistsException
1498+
else:
1499+
dest' = dest
1500+
1501+
FS' = FS where:
1502+
forall c in descendants(LocalFS, src):
1503+
not exists(FS', final_name(c)) or overwrite
1504+
and forall c in descendants(LocalFS, src) where isDir(LocalFS, c):
1505+
FS'.Directories = FS'.Directories + (dest' + childElements(src, c))
1506+
and forall c in descendants(LocalFS, src) where isFile(LocalFS, c):
1507+
FS'.Files[final_name(c, dest')] = LocalFS.Files[c]
1508+
LocalFS' = LocalFS where
1509+
not delSrc or (delSrc = true and delete(LocalFS, src, true))
1510+
```
1511+
There are no expectations of operation isolation / atomicity.
1512+
This means files can change in source or destination while the operation is executing.
1513+
No guarantees are made for the final state of the file or directory after a copy other than it is
1514+
best effort. E.g.: when copying a directory, one file can be moved from source to destination but
1515+
there's nothing stopping the new file at destination being updated while the copy operation is still
1516+
in place.
1517+
1518+
#### Implementation
1519+
1520+
The default HDFS implementation, is to recurse through each file and folder, found at `src`, and
1521+
copy them sequentially to their final destination (relative to `dst`).
1522+
1523+
Object store based file systems should be mindful of what limitations arise from the above
1524+
implementation and could take advantage of parallel uploads and possible re-ordering of files copied
1525+
into the store to maximize throughput.
1526+
1527+
14221528
## <a name="RemoteIterator"></a> interface `RemoteIterator`
14231529

14241530
The `RemoteIterator` interface is used as a remote-access equivalent
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
package org.apache.hadoop.fs;
20+
21+
import java.io.File;
22+
23+
import org.junit.Test;
24+
25+
import org.apache.hadoop.conf.Configuration;
26+
import org.apache.hadoop.fs.contract.AbstractContractCopyFromLocalTest;
27+
import org.apache.hadoop.fs.contract.AbstractFSContract;
28+
import org.apache.hadoop.fs.contract.localfs.LocalFSContract;
29+
30+
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
31+
32+
public class TestLocalFSCopyFromLocal extends AbstractContractCopyFromLocalTest {
33+
@Override
34+
protected AbstractFSContract createContract(Configuration conf) {
35+
return new LocalFSContract(conf);
36+
}
37+
38+
@Test
39+
public void testDestinationFileIsToParentDirectory() throws Throwable {
40+
describe("Source is a file and destination is its own parent directory");
41+
42+
File file = createTempFile("local");
43+
Path dest = new Path(file.getParentFile().toURI());
44+
Path src = new Path(file.toURI());
45+
46+
intercept(PathOperationException.class,
47+
() -> getFileSystem().copyFromLocalFile( true, true, src, dest));
48+
}
49+
50+
@Test
51+
public void testDestinationDirectoryToSelf() throws Throwable {
52+
describe("Source is a directory and it is copied into itself with " +
53+
"delSrc flag set, destination must not exist");
54+
55+
File source = createTempDirectory("srcDir");
56+
Path dest = new Path(source.toURI());
57+
getFileSystem().copyFromLocalFile( true, true, dest, dest);
58+
59+
assertPathDoesNotExist("Source found", dest);
60+
}
61+
62+
@Test
63+
public void testSourceIntoDestinationSubDirectoryWithDelSrc() throws Throwable {
64+
describe("Copying a parent folder inside a child folder with" +
65+
" delSrc=TRUE");
66+
File parent = createTempDirectory("parent");
67+
File child = createTempDirectory(parent, "child");
68+
69+
Path src = new Path(parent.toURI());
70+
Path dest = new Path(child.toURI());
71+
getFileSystem().copyFromLocalFile(true, true, src, dest);
72+
73+
assertPathDoesNotExist("Source found", src);
74+
assertPathDoesNotExist("Destination found", dest);
75+
}
76+
77+
@Test
78+
public void testSourceIntoDestinationSubDirectory() throws Throwable {
79+
describe("Copying a parent folder inside a child folder with" +
80+
" delSrc=FALSE");
81+
File parent = createTempDirectory("parent");
82+
File child = createTempDirectory(parent, "child");
83+
84+
Path src = new Path(parent.toURI());
85+
Path dest = new Path(child.toURI());
86+
getFileSystem().copyFromLocalFile(false, true, src, dest);
87+
88+
Path recursiveParent = new Path(dest, parent.getName());
89+
Path recursiveChild = new Path(recursiveParent, child.getName());
90+
91+
// This definitely counts as interesting behaviour which needs documented
92+
// Depending on the underlying system this can recurse 15+ times
93+
recursiveParent = new Path(recursiveChild, parent.getName());
94+
recursiveChild = new Path(recursiveParent, child.getName());
95+
assertPathExists("Recursive parent not found", recursiveParent);
96+
assertPathExists("Recursive child not found", recursiveChild);
97+
}
98+
}

0 commit comments

Comments
 (0)