The HTML has no attributes it’s very vanilla and plain
<!doctype html> <html> <body> <section class="body-copy"> <h2>Topic 1</h2> <p>data-a</p> <p>data-b</p> <p>data-c</p> <h2>Topic 2</h2> <p>data-d</p> <p>data-e</p> <p>data-f</p> <h2>Topic 3</h2> <p>data-g</p> <p>data-h</p> <p>data-i</p> </section> </body> </html>
I’m using Floki and I’m trying to parse it so I can create a List of maps like so.
%{ topic: "Topic 1", data: "data-a" } %{ topic: "Topic 1", data: "data-b" } %{ topic: "Topic 1", data: "data-c" } %{ topic: "Topic 2", data: "data-d" } %{ topic: "Topic 2", data: "data-e" } %{ topic: "Topic 2", data: "data-f" }
I’m struggling to get all the P tags under each H2 with Floki.
# Try loading this html path = "/Users/Foo/Desktop/test.html" {_, local_file } = File.read(path) # This will return me all the h2 Floki.find(local_file, "h2") [{"h2", [], ["Topic 1"]}, {"h2", [], ["Topic 2"]}, {"h2", [], ["Topic 3"]}] # This will return me the first p from a specific h2. But not all of them Floki.find(local_file, "h2:nth-of-type(1) + p") [{"p", [], ["data-a"]}] # This return this first p for Topic 2 but I need 2 more p tags (data-e, data-f) Floki.find(local_file, "h2:nth-of-type(2) + p") [{"p", [], ["data-d"]}] # This return this first p for Topic 3 but I need 2 more p tags (data-h, data-i) Floki.find(local_file, "h2:nth-of-type(3) + p") [{"p", [], ["data-g"]}]
Question
I cannot figure out how to get ONLY the P tags for each H2.