Name	Name	Last commit message	Last commit date
Latest commit History 1,033 Commits
.github	.github
.tools	.tools
cmake	cmake
cmd/master	cmd/master
doc	doc
docker	docker
example	example
k8s	k8s
logo	logo
pkg	pkg
python	python
scripts	scripts
.dockerignore	.dockerignore
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
.travis.yml	.travis.yml
CMakeLists.txt	CMakeLists.txt
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
OWNERS.md	OWNERS.md
README.md	README.md
RELEASE.md	RELEASE.md
go.mod	go.mod
go.sum	go.sum

Name

Last commit message

Last commit date

1,033 Commits

.pre-commit-config.yaml

EDL: Elastic Deep Learning

EDL is an Elastic Deep Learning framework designed to help deep learning cloud service providers to build cluster cloud services using deep learning framework PaddlePaddle.

EDL includes two parts:

A Kubernetes controller for the elastic scheduling of distributed deep learning jobs and tools for adjusting manually.
Making PaddlePaddle a fault-tolerable deep learning framework with usability API for job management.

EDL is an incubation-stage project of the LF AI Foundation.

While many hardware and software manufacturers are working on improving the running time of deep learning jobs, EDL optimizes

the global utilization of the cluster, and
the waiting time of job submitters.

Key Features:

Efficiency: Provides parallelism strategies to minimize adjustment overheads.
Consistency: Accuracy verification on multiple models compared those without scaling.
Flexibility: Any components can be killed or joined at any time.
Easy to use: Few lines of code need to be added to support EDL.

Quick start demo: EDL Resnet50 experiments on a single machine:

We highly recommand you run it in our docker:

Start a Jobserver on one node.

docker pull hub.baidubce.com/paddle-edl/paddle_edl:latest-cuda10.0-cudnn7 cd example/demo/collective ./start_job_server.sh

Start a Jobclient which controls the worker process.

#Set the ImageNet data path export PADDLE_EDL_IMAGENET_PATH=<your path> #Set the checkpoint path export PADDLE_EDL_FLEET_CHECKPOINT_PATH=<your path> mkdir -p resnet50_pod ./start_job_client.sh

Experiments result

total batch size	acc1	acc5
1024	76.0	75.8

Design Docs

A scheduler on Kubernetes:
- Scheduler
EDL framework on PaddlePaddle:
- Fault-Tolerant Training in PaddlePaddle
- EDL framework

Applications:

EDL Distillation:
EDL CTR
- EDL CTR training and deployment on Baidu Cloud

FAQ

TBD

License

EDL is provided under the Apache-2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EDL: Elastic Deep Learning

Key Features:

Quick start demo: EDL Resnet50 experiments on a single machine:

Design Docs

Applications:

FAQ

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 21

Uh oh!

Languages

License

elasticdeeplearning/edl

Folders and files

Latest commit

History

Repository files navigation

EDL: Elastic Deep Learning

Key Features:

Quick start demo: EDL Resnet50 experiments on a single machine:

Design Docs

Applications:

FAQ

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 21

Uh oh!

Languages

Packages