A Reinforcement Learning bbaasseedd RRoouuttiinngg PPrroottooccooll wwiitthh QQooSS SSuuppppoorrtt ffoorr BBiioommeeddiiccaall SSeennssoorr NNeettwwoorrkkss Author: XXuueeddoonngg LLiiaanngg IIllaannggkkoo BBaallaassiinngghhaamm SSaanngg--SSeeoonn BByyuunn The Interventional Center, Rikshospitalet University Hospital, Oslo, Norway N-0027 Dept. of Informatics, University of Oslo, Oslo, Norway N-0316 Dept. of Electronics and Telecommunications, Norwegian University of Science and Technology, Trondheim, Norway N-7491 Presented by: Iffat Anjum(Roll: 16) Nazia Alam(Roll: 28) 15th Batch. Date:26 th April, 2012 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka Slide 1
CCoonntteennttss  Contribution.  Problem Definition. • Related works. • Biomedical Sensor Networks • Reinforcement Learning • Q-learning  Design of RL-QRP • Local Information Exchange • Q-learning Implementation • Learning-Based Routing Algorithm  Performance Evaluation.  Limitation. 2 Slide 2 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
CCoonnttrriibbuuttiioonnss  In RL-QRP, optimal routing policies can be found through experiences and rewards without the need of maintaining precise network state information.  Considering impact of network traffic load and sensor node mobility on the network performance, RL-QRP fits well in dynamic environments.  RL-QRP performs well in terms of a number of QoS metrics and energy efficiency in various medical scenarios. 3 Slide 3 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
Slide 4 PPrroobblleemm DDeeffiinniittiioonn The main function of biomedical sensor networks is , Ensuring that data packets can be sensed and delivered to the medical server reliably and efficiently. Related works A number of QoS support routing protocols have been proposed for wireless sensor networks recently,  INSIGNIA, supported in mobile ad hoc networks, framework is based on in-band signaling and soft-state resource management. But not suitable for biomedical sensor networks for the inflexible nature of resource reservation scheme. Green Networking Research Group 4 Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn  CEDAR, is a core-extraction distributed ad hoc routing algorithm for QoS routing in ad hoc network environments. But the core could be the bottleneck of the network, the selection and maintenance of the core use extra network resources.  AdaR, adaptively learns optimal strategy to achieve multiple optimization goals. But how to map diverse QoS requirements into concrete Q-values is not defined. Most of the previous QoS support routing protocols suffer . Heavy communication overhead. Computation burden of complicated algorithms. 5 Slide 5 Related works Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn  A biomedical sensor network is deployed in a certain area, Sensor nodes are implanted or attached to patients body, Sink nodes are deployed in fix positions.  Biomedical sensor networks have the following features:  Dynamic network topology : sensor node may leave, join or dead (run out of battery);  Time-varying wireless channel with serious electrical interferences;  Each sensor node has different QoS requirements , duty cycle, packet arrival rate and forwarding willingness. 6 Slide 6 Biomedical Sensor Networks Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn  Mobile nodes are aware of its geographic location , either using global positioning system (GPS) or distributed localization services.  Each node is aware of its immediate neighbors (within its radio range) and their locations using beacon exchanges.  Mobile sensor nodes follow the Random Waypoint Mobility Model (RWMM), for the network mobility.  This paper focus on 2 types of QoS requirements, Packet delivery ratio. End-to-end delay. 7 Slide 7 Biomedical Sensor Networks Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn 8 Slide 8 Reinforcement Learning Figure: A reinforcement learning model. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn  The concept of Reinforcement Learning is Markov Decision Process.  A MDP models an agent with a tuple (S,A,P,R). • S is the set of states, • A is a set of actions, • P(s` |s, a) is the transition model that describes the probability of entering state s` after executing action a at state s. • R(s, a, s` ) is the reward obtained when the agent executes a at s and enter s`.  The goal of solving a MDP is to find an optimal policy , π : S → A, that maps states to actions such that the cumulative reward is maximized. 9 Slide 9 Reinforcement Learning Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPrroobblleemm DDeeffiinniittiioonn  A model-free method which calculates function Q(s, a) to find an optimal decision policy.  Each time an action a is executed, the agent receives an immediate reward r from the environment. • Q(s, a) denotes the quality of action a at state s, α is the learning rate. And the weight of future rewards is modeled by γ. • Q(s`, a`) is the expected future reward at state s` by taking action a`. 10 Slide 10 Q-learning Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP  The QoS routes computation and selection are based on a distributed reinforcement learning algorithm.  Sensor node calculates the route independently and individually.  The Q-value Q(s, a) stands for the quality (progress has been made) of the action a at state s. 11 Slide 11 Figure: Reinforcement learning based routing model. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP  Each node will check the Qos requirement of the data packet and its Q-value table.  The node then checks if it can make a certain progress of the data packet, if so, it will forward the packet to one of its neighboring nodes with the highest Q-value; if not, the packet will be dropped or sent with ‘best effort’. The local information exchange are facilitated using beacon exchanges with 1-hop neighboring sensor nodes. Which contains, 12 Slide 12 QoS Support Consideration Local Information Exchange  Position Information Exchange.  Q-values Exchange. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP Q-learning Implementation  State: S = {si}, i= 1,2...N. N is the number of sensor nodes. Each node is a state s ∈ S.  Action: A = {a(sj |si)}, si, sj ∈ S. Execution of a(sj |si) means that a packet is forwarded from state si to sj , provided si and sj are within each other’s communication range.  Reward function: R = prg(Pn). Rn is the reward of execution of the action, which describes the progress has been made of forwarding data packet Pn. 13 Slide 13 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP  The reward of an action is implemented using ACK scheme. When node sj receives a packet from node si, sj will acknowledge the packet by sending an ACK packet.  By calculating the1-hop delay, and the ratio of the number of ACK received divided by the number of data packets sent, si can estimate the link properties between si and sj. 14 Slide 14 Q-learning Implementation Tsisj is the experienced delay between node si and sj , Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP 15 Slide 15 Learning-Based Routing Algorithm Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
DDeessiiggnn ooff RRLL--QQRRPP 16 Slide 16 Learning-Based Routing Algorithm Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPeerrffoorrmmaannccee EEvvaalluuaattiioonn Fig: Average end-to-end delay Fig: Average packet delivery to the sink node. ratio to the sink node. 17 Slide 17 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
PPeerrffoorrmmaannccee EEvvaalluuaattiioonn Fig: The impact of node mobility Fig:The impact of network traffic on average packet delivery ratio. load on average end-to-end delay. 18 Slide 18 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
LLiimmiittaattiioonn  RL-QRP has neglected many common QoS requirements like network lifetime, throughput, connectivity etc.  Sensor nodes does not consider the interactions between itself and other sensor nodes, but this approach is not sufficient to achieve global optimization. • Sensor nodes should consider the interactions with both the environment and the other nodes in the network, and cooperatively calculate the QoS routes in the context of multi-agent reinforcement learning (MaRL) framework. 19 Slide 19 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
TTHHAANNKK YYOOUU Green Networking Research Group 20 Dept. of Computer Science and Engineering, University of Dhaka

A reinforcement learning based routing protocol with qo s support for biomedical sensor networks

  • 1.
    A Reinforcement Learningbbaasseedd RRoouuttiinngg PPrroottooccooll wwiitthh QQooSS SSuuppppoorrtt ffoorr BBiioommeeddiiccaall SSeennssoorr NNeettwwoorrkkss Author: XXuueeddoonngg LLiiaanngg IIllaannggkkoo BBaallaassiinngghhaamm SSaanngg--SSeeoonn BByyuunn The Interventional Center, Rikshospitalet University Hospital, Oslo, Norway N-0027 Dept. of Informatics, University of Oslo, Oslo, Norway N-0316 Dept. of Electronics and Telecommunications, Norwegian University of Science and Technology, Trondheim, Norway N-7491 Presented by: Iffat Anjum(Roll: 16) Nazia Alam(Roll: 28) 15th Batch. Date:26 th April, 2012 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka Slide 1
  • 2.
    CCoonntteennttss  Contribution.  Problem Definition. • Related works. • Biomedical Sensor Networks • Reinforcement Learning • Q-learning  Design of RL-QRP • Local Information Exchange • Q-learning Implementation • Learning-Based Routing Algorithm  Performance Evaluation.  Limitation. 2 Slide 2 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 3.
    CCoonnttrriibbuuttiioonnss  InRL-QRP, optimal routing policies can be found through experiences and rewards without the need of maintaining precise network state information.  Considering impact of network traffic load and sensor node mobility on the network performance, RL-QRP fits well in dynamic environments.  RL-QRP performs well in terms of a number of QoS metrics and energy efficiency in various medical scenarios. 3 Slide 3 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 4.
    Slide 4 PPrroobblleemmDDeeffiinniittiioonn The main function of biomedical sensor networks is , Ensuring that data packets can be sensed and delivered to the medical server reliably and efficiently. Related works A number of QoS support routing protocols have been proposed for wireless sensor networks recently,  INSIGNIA, supported in mobile ad hoc networks, framework is based on in-band signaling and soft-state resource management. But not suitable for biomedical sensor networks for the inflexible nature of resource reservation scheme. Green Networking Research Group 4 Dept. of Computer Science and Engineering, University of Dhaka
  • 5.
    PPrroobblleemm DDeeffiinniittiioonn CEDAR, is a core-extraction distributed ad hoc routing algorithm for QoS routing in ad hoc network environments. But the core could be the bottleneck of the network, the selection and maintenance of the core use extra network resources.  AdaR, adaptively learns optimal strategy to achieve multiple optimization goals. But how to map diverse QoS requirements into concrete Q-values is not defined. Most of the previous QoS support routing protocols suffer . Heavy communication overhead. Computation burden of complicated algorithms. 5 Slide 5 Related works Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 6.
    PPrroobblleemm DDeeffiinniittiioonn A biomedical sensor network is deployed in a certain area, Sensor nodes are implanted or attached to patients body, Sink nodes are deployed in fix positions.  Biomedical sensor networks have the following features:  Dynamic network topology : sensor node may leave, join or dead (run out of battery);  Time-varying wireless channel with serious electrical interferences;  Each sensor node has different QoS requirements , duty cycle, packet arrival rate and forwarding willingness. 6 Slide 6 Biomedical Sensor Networks Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 7.
    PPrroobblleemm DDeeffiinniittiioonn Mobile nodes are aware of its geographic location , either using global positioning system (GPS) or distributed localization services.  Each node is aware of its immediate neighbors (within its radio range) and their locations using beacon exchanges.  Mobile sensor nodes follow the Random Waypoint Mobility Model (RWMM), for the network mobility.  This paper focus on 2 types of QoS requirements, Packet delivery ratio. End-to-end delay. 7 Slide 7 Biomedical Sensor Networks Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 8.
    PPrroobblleemm DDeeffiinniittiioonn 8 Slide 8 Reinforcement Learning Figure: A reinforcement learning model. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 9.
    PPrroobblleemm DDeeffiinniittiioonn The concept of Reinforcement Learning is Markov Decision Process.  A MDP models an agent with a tuple (S,A,P,R). • S is the set of states, • A is a set of actions, • P(s` |s, a) is the transition model that describes the probability of entering state s` after executing action a at state s. • R(s, a, s` ) is the reward obtained when the agent executes a at s and enter s`.  The goal of solving a MDP is to find an optimal policy , π : S → A, that maps states to actions such that the cumulative reward is maximized. 9 Slide 9 Reinforcement Learning Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 10.
    PPrroobblleemm DDeeffiinniittiioonn A model-free method which calculates function Q(s, a) to find an optimal decision policy.  Each time an action a is executed, the agent receives an immediate reward r from the environment. • Q(s, a) denotes the quality of action a at state s, α is the learning rate. And the weight of future rewards is modeled by γ. • Q(s`, a`) is the expected future reward at state s` by taking action a`. 10 Slide 10 Q-learning Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 11.
    DDeessiiggnn ooff RRLL--QQRRPP  The QoS routes computation and selection are based on a distributed reinforcement learning algorithm.  Sensor node calculates the route independently and individually.  The Q-value Q(s, a) stands for the quality (progress has been made) of the action a at state s. 11 Slide 11 Figure: Reinforcement learning based routing model. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 12.
    DDeessiiggnn ooff RRLL--QQRRPP  Each node will check the Qos requirement of the data packet and its Q-value table.  The node then checks if it can make a certain progress of the data packet, if so, it will forward the packet to one of its neighboring nodes with the highest Q-value; if not, the packet will be dropped or sent with ‘best effort’. The local information exchange are facilitated using beacon exchanges with 1-hop neighboring sensor nodes. Which contains, 12 Slide 12 QoS Support Consideration Local Information Exchange  Position Information Exchange.  Q-values Exchange. Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 13.
    DDeessiiggnn ooff RRLL--QQRRPP Q-learning Implementation  State: S = {si}, i= 1,2...N. N is the number of sensor nodes. Each node is a state s ∈ S.  Action: A = {a(sj |si)}, si, sj ∈ S. Execution of a(sj |si) means that a packet is forwarded from state si to sj , provided si and sj are within each other’s communication range.  Reward function: R = prg(Pn). Rn is the reward of execution of the action, which describes the progress has been made of forwarding data packet Pn. 13 Slide 13 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 14.
    DDeessiiggnn ooff RRLL--QQRRPP  The reward of an action is implemented using ACK scheme. When node sj receives a packet from node si, sj will acknowledge the packet by sending an ACK packet.  By calculating the1-hop delay, and the ratio of the number of ACK received divided by the number of data packets sent, si can estimate the link properties between si and sj. 14 Slide 14 Q-learning Implementation Tsisj is the experienced delay between node si and sj , Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 15.
    DDeessiiggnn ooff RRLL--QQRRPP 15 Slide 15 Learning-Based Routing Algorithm Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 16.
    DDeessiiggnn ooff RRLL--QQRRPP 16 Slide 16 Learning-Based Routing Algorithm Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 17.
    PPeerrffoorrmmaannccee EEvvaalluuaattiioonn Fig:Average end-to-end delay Fig: Average packet delivery to the sink node. ratio to the sink node. 17 Slide 17 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 18.
    PPeerrffoorrmmaannccee EEvvaalluuaattiioonn Fig:The impact of node mobility Fig:The impact of network traffic on average packet delivery ratio. load on average end-to-end delay. 18 Slide 18 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 19.
    LLiimmiittaattiioonn  RL-QRPhas neglected many common QoS requirements like network lifetime, throughput, connectivity etc.  Sensor nodes does not consider the interactions between itself and other sensor nodes, but this approach is not sufficient to achieve global optimization. • Sensor nodes should consider the interactions with both the environment and the other nodes in the network, and cooperatively calculate the QoS routes in the context of multi-agent reinforcement learning (MaRL) framework. 19 Slide 19 Green Networking Research Group Dept. of Computer Science and Engineering, University of Dhaka
  • 20.
    TTHHAANNKK YYOOUU GreenNetworking Research Group 20 Dept. of Computer Science and Engineering, University of Dhaka