This document discusses the implementation of a Q-learning based routing protocol aimed at enhancing the lifetime of wireless sensor networks (WSNs). The research explores the use of reinforcement learning techniques to develop routing protocols that utilize local node information such as residual energy and hop length for optimal data forwarding. Performance evaluations through simulations indicate improvements in network lifetime, throughput, and end-to-end delay compared to the AODV protocol.