The document presents a study on an effective pre-processing algorithm for information retrieval systems, emphasizing the importance of tokenization in improving data retrieval efficiency. It discusses the shortcomings of traditional tokenization methods and proposes a new approach based on document vectors that enhances accuracy and reduces search time. The paper details the steps involved in the proposed algorithm, illustrating how it streamlines the tokenization process for better performance in information retrieval tasks.