People really really want LLMs to output a highly reliable finished product, and I suspect we're probably never gonna get there. Lots of progress over the past couple years, but not on that.
I think it's much more interesting to focus on use cases which don't require that, where gen AI is an intermediate step, a creator of input (whether for humans or for other programs).
Maybe there is none, and this is just one example of a fundamental LLM limitation.