Hi! I intend to build a simple web crawler in Elixir, and since a web crawler is really just a pipeline of url-queue -> fetcher -> parser (with some filters and rate limiters in between) I have decided to use GenStage for this. I have already built the url queue. Now, I’m thinking of how to build the fetcher.
Obviously, the fetcher will need to perform HTTP requests in parallel, and since there is a limit to how many I should do at a time (bandwidth right…? correct me if I’m wrong about this), I should use some sort of pooling system. I have looked at ConsumerSupervisor in the GenStage docs, but it is only for the last part, the consumer portion.
How should I go about implementing this, where the fetcher stage is a producer_consumer (So i cant use the ConsumerSupervisor)?
. Great tip,