The document presents a reinforcement learning-based routing protocol (RL-QRP) designed for biomedical sensor networks, which allows for optimal routing policies without precise network state information. It discusses the challenges faced by existing QoS support routing protocols and provides insights into the implementation of a Q-learning approach for dynamic and mobile network environments. The performance evaluation demonstrates RL-QRP's effectiveness in terms of QoS metrics and energy efficiency, while also acknowledging limitations regarding certain QoS requirements and interactions between sensor nodes.