Working with Documents - Python .docx Module

Working with Documents - Python .docx Module

The python-docx module in Python is used to create, modify, and extract information from Word documents (.docx). It is a powerful library that allows you to work with Word documents programmatically.

Here's how you can work with the python-docx module to handle .docx files:

Installation

You can install the python-docx module using pip:

pip install python-docx 

Creating a New Document

from docx import Document # Create a new Document doc = Document() # Add a heading doc.add_heading('Document Title', 0) # Add a paragraph p = doc.add_paragraph('A plain paragraph having some ') p.add_run('bold').bold = True p.add_run(' and some ') p.add_run('italic.').italic = True # Add a page break doc.add_page_break() # Save the document doc.save('demo.docx') 

Opening an Existing Document

# Open an existing document doc = Document('existing_document.docx') # Print each paragraph for para in doc.paragraphs: print(para.text) 

Adding Other Elements

# Add a heading at level 1 doc.add_heading('Heading Level 1', level=1) # Add a bullet list doc.add_paragraph('Item 1', style='ListBullet') doc.add_paragraph('Item 2', style='ListBullet') # Add a numbered list doc.add_paragraph('First item', style='ListNumber') doc.add_paragraph('Second item', style='ListNumber') # Add a picture doc.add_picture('picture.png', width=docx.shared.Inches(1.0)) # Save the document doc.save('demo.docx') 

Modifying Document Style

from docx.shared import Pt from docx.enum.text import WD_ALIGN_PARAGRAPH # Add a centered paragraph paragraph = doc.add_paragraph('Centered Text') paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # Change font size for run in paragraph.runs: run.font.size = Pt(24) # Save the document doc.save('demo.docx') 

Reading a Document

# Print each paragraph's text in the document for para in doc.paragraphs: print(para.text) # Print each table's data for table in doc.tables: for row in table.rows: for cell in row.cells: print(cell.text) 

Remember, python-docx can only create, read, and modify .docx (Office Open XML) files, not the older .doc (binary) format.

Extracting Text from a Document

# Extracting and printing text from all paragraphs full_text = [] for para in doc.paragraphs: full_text.append(para.text) print('\n'.join(full_text)) 

When working with documents, you might need to consider more advanced features like styles, templates, sections, headers, footers, etc. The python-docx library provides a rich set of features to handle these as well, though some advanced Word features are not yet fully supported.


More Tags

test-environments cookies statusbar monitoring broadcastreceiver gitlab-ci datastax spring-security-oauth2 samesite tradingview-api

More Programming Guides

Other Guides

More Programming Examples