[SOLVED] [Beautiful Soup] How to deprettify?

Winfried · (This post was last modified: May-01-2025, 03:46 PM by Winfried.)

Hello,

I made the mistake of using soup.prettify() to save soups to files, and I now have whitespaces that show up as useless spaces when viewing the files in an HTML WYSIWYG editor.

The following code doesn't work to remove those useless whitespaces.

Before I write a Python script to run the files through Tidy instead, does someone know if it can be fixed with BS?

Thank you.

for file in glob.glob("*.html"):	BASE = Path(file).stem	OUTPUTFILE = fr"{BASE}.CONV.html"	soup = BeautifulSoup(open(file,"br"),"lxml")	for tag in soup.find_all():	if tag.string:	tag.string.replace_with(' '.join(tag.string.split()))	print(tag.string)	else:	print(tag.name, " no string")	pass	with open(OUTPUTFILE, 'w', encoding='utf-8') as outp:	outp.write(str(soup))

***snippsat*** · May-01-2025, 11:15 AM

To show the problem.

from bs4 import BeautifulSoup html = '''\ <body> <h1>This is a Heading</h1> <p>This is a paragraph</p> <p>blue car</p> </body>''' soup = BeautifulSoup(html, 'lxml') print(soup.prettify()) print('-' * 25) print(str(soup))

Output:
<body> <h1> This is a Heading </h1> <p> This is a paragraph </p> <p> blue car </p> </body> ------------------------- <body> <h1>This is a Heading</h1> <p>This is a paragraph</p> <p>blue car</p> </body>

So the new line is annoying(i tried to fix it a long time ago),now just ways under.
Easy fix is to use to html formatting online eg code beautify.
Or install Prettier,has a command line tool eg use prettier --write . formatt all html file in a folder.

G:\div_code\html_file λ prettier --write . h1.html 170ms h2.html 5ms

Then output of both from BS option over will be correct formatted html.

Output:
<body> <h1>This is a Heading</h1> <p>This is a paragraph</p> <p>blue car</p> </body>

Winfried · May-01-2025, 03:46 PM

Thank you.

Winfried · May-11-2025, 05:32 PM

For others' benefit, here's how to do it in Beautiful Soup:

import sys import os import glob import shutil from bs4 import BeautifulSoup ROOT = r"c:\temp" os.chdir(ROOT) for file in glob.glob("*.html"):	print("Handling ", file)	#save original file	ORIGFILE = fr"{file}.orig"	#grab original times	mtime = os.stat(file).st_mtime	atime = os.stat(file).st_atime	tup = (atime, mtime)	dest = shutil.copyfile(file, ORIGFILE)	os.utime(ORIGFILE, tup)	#Remove all carriage returns	with open(file, "r") as f:	dna = f.read().replace("\n", "")	#trim each string	soup = BeautifulSoup(dna,"lxml")	_ = [s.replace_with(s.text.strip()) for s in soup.find_all(string=True)]	#save soup back to file	with open(file, 'w', encoding='utf-8') as outp:	outp.write(str(soup))	#Must close before updating time	os.utime(file, tup)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [Beautiful Soup] Replace tag.string from another file?	Winfried	2	1,614	May-01-2025, 03:43 PM Last Post: Winfried
	[SOLVED] [Beautiful Soup] Move line to top in HTML head?	Winfried	0	946	Apr-13-2025, 05:50 AM Last Post: Winfried
	Trouble selecting attribute with beautiful soup	bananatoast	3	3,807	Jan-30-2022, 10:01 AM Last Post: bananatoast
	I need help parsing through data and creating a database using beautiful soup	username369	1	2,753	Sep-22-2021, 08:45 PM Last Post: Larz60+

[SOLVED] [Beautiful Soup] How to deprettify?

User Panel Messages

Announcements