File tree Expand file tree Collapse file tree 1 file changed +16
-6
lines changed Expand file tree Collapse file tree 1 file changed +16
-6
lines changed Original file line number Diff line number Diff line change @@ -90,8 +90,17 @@ python -m paddle.distributed.launch --selected_gpus 0 \
9090 <img src="doc/distill.gif" width="550">
9191</p>
9292
93- # EDL Framework
94- ## Quickstart:EDL Resnet50 experiments on a single machine in docker:
93+ <h2 align="center"> Release 0.2.0 </h2>
94+ 
95+ <h3 align="center"> Checkpoint based elastic training on multiple GPUs </h3>
96+ 
97+ - We have several training nodes running on each GPU.
98+ - A master node is responsible for checkpoint saving and all the other nodes are elastic nodes.
99+ - When elastic nodes join or leave current training job, training hyper-parameter will be adjusted automatically.
100+ - Newly comming training nodes will load checkpoint from remote FS automatically.
101+ - A model checkpoint is saved every serveral steps given by user
102+ 
103+ <h3 align="center"> Resnet50 experiments on a single machine in docker </h3>
95104
961051. Start a JobServer on one node which generates changing scripts.
97106
@@ -137,10 +146,11 @@ python -u paddle_edl.demo.collective.job_client_demo \
137146The whole example is [here](example/demo/collective)
138147
139148
140- ##  FAQ
149+ <h2 align="center">  FAQ </h2> 
141150
142- TBD
143- 
144- ## License
145151
152+ <h2 align="center"> License </h2>
146153EDL is provided under the [Apache-2.0 license](LICENSE).
154+ 
155+ <h2 align="center"> Contribution </h2>
156+ If you want to contribute code to Paddle Serving, please reference
                         You can’t perform that action at this time. 
           
                  
0 commit comments