Post

If You Have To Use Excel, Automate It: Formatting with Python

If You Have To Use Excel, Automate It: Formatting with Python

Automate Excel Formatting with Python, Pandas, & XlsxWriter

Sometimes people need to download some data, format it in Excel, and distribute it. And they have to do this every day, and you look at it and think “I bet I could automate that.” And you are right! You could automate it and make someone’s life a lot easier. Don’t be a slacker, help them out.

How it Works

After writing a Pandas DataFrame to an Excel sheet using df.to_excel(writer, ...), you can access XlsxWriter’s workbook and worksheet objects. The library has a ton of formatting options I am not even touching here. If you want to review them all, check out the XlsxWriter documentation.

Let’s walk through a Python script that uses Pandas and XlsxWriter to create a well-formatted Excel report from a DataFrame (df1).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import pandas as pd
# Assume df1 is a pre-existing Pandas DataFrame
# For example:
# data = {'ColA': [1, 2, 3], 'ColB': ['X', 'Y', 'Z'], 'ColC': [10.1, 20.2, 30.3], 'ColD': [True, False, True]}
# df1 = pd.DataFrame(data)

# 1. Initialize ExcelWriter with XlsxWriter engine
writer = pd.ExcelWriter('formatted_report.xlsx', engine='xlsxwriter')

# 2. Write DataFrame to a sheet
# index=False prevents writing the DataFrame index as a column in Excel
df1.to_excel(writer, sheet_name='Sheet1', index=False)

# 3. Access XlsxWriter workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['Sheet1']

# 4. Format the Header Row
# Define a format for the header cells
header_format = workbook.add_format({
    'bold': True,
    'italic': True,
    'underline': True,
    'font_size': 13,
    'bottom': 2,    # Medium border
    'top': 2,       # Medium border
    'left': 2,      # Medium border
    'right': 2,     # Medium border
    'align': 'center',
    'valign': 'vcenter',
    'bg_color': '#DDEBF7' # A light blue background
})
# Apply the format to the header row (rewriting what Pandas initially wrote)
for col_num, value in enumerate(df1.columns.values):
    worksheet.write(0, col_num, value, header_format)

# 5. Define Formats for Data Cells
# General format for data cells (thin borders on all sides)
format1_all_borders = workbook.add_format({
    'bottom': 1, 'top': 1, 'left': 1, 'right': 1
})

# Specific format for the first row of data (medium top border, thin other borders)
format2_first_data_row = workbook.add_format({
    'top': 2,    # Medium top border
    'bottom': 1, # Thin bottom border
    'right': 1,  # Thin right border
    'left': 1    # Thin left border
})

# 6. Apply Conditional Formatting to Data Cells
# This applies 'format2_first_data_row' to the first row of data (row index 1)
# It checks if cells are not blank (criteria: '>=', value: '""' effectively means not blank for text/numbers)
if len(df1) > 0: # Ensure there is at least one data row
    worksheet.conditional_format(1, 0, 1, df1.shape[1] - 1, {
        'type': 'cell',
        'criteria': '>=', # Applies to cells with any content (numbers or text)
        'value': '""',    # Compares against an empty string
        'format': format2_first_data_row
    })

    # Apply 'format1_all_borders' to all data cells
    worksheet.conditional_format(1, 0, df1.shape[0], df1.shape[1] - 1, {
        'type': 'cell',
        'criteria': '>=',
        'value': '""',
        'format': format1_all_borders
    })

# 7. Add an Excel Table with AutoFilter and Style
if len(df1) > 0:
    column_settings = [{'header': column} for column in df1.columns.values]
    worksheet.add_table(0, 0, df1.shape[0], df1.shape[1] - 1, {
        'columns': column_settings, # Use DataFrame headers for the table
        'autofilter': True,
        'style': 'Table Style Light 1'
    })
else: # Handle empty DataFrame: create table with only headers
    column_settings = [{'header': column} for column in df1.columns.values]
    worksheet.add_table(0, 0, 0, df1.shape[1] - 1, {
        'columns': column_settings,
        'autofilter': True,
        'style': 'Table Style Light 1'
    })

# 8. Set Column Widths
# Adjust these character widths based on your data
worksheet.set_column('A:A', 10)
worksheet.set_column('B:B', 22)
worksheet.set_column('C:D', 11)
worksheet.set_column('E:K', 8.5) # Example range

# 9. Page Setup for Printing
worksheet.set_landscape()      # Set page orientation to landscape
worksheet.repeat_rows(0)       # Repeat header row (row 0) on each printed page

# 10. Save and Close the Excel File
try:
    writer.close() # This also saves the file
    print("Excel file 'formatted_report.xlsx' saved successfully!")
except Exception as e:
    print(f"Error saving Excel file: {e}")

Explanation of the Code

  1. Initialize ExcelWriter with XlsxWriter engine

    1
    
    writer = pd.ExcelWriter('formatted_report.xlsx', engine='xlsxwriter')
    
  2. Write DataFrame to a sheet

    index=False prevents writing the DataFrame index as a column in Excel.

    1
    
    df1.to_excel(writer, sheet_name='Sheet1', index=False)
    
  3. Access XlsxWriter workbook and worksheet objects

    1
    2
    
    workbook = writer.book
    worksheet = writer.sheets['Sheet1']
    
  4. Format the Header Row

    Define a format for the header cells.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    header_format = workbook.add_format({
        'bold': True,
        'italic': True,
        'underline': True,
        'font_size': 13,
        'bottom': 2,    # Medium border
        'top': 2,       # Medium border
        'left': 2,      # Medium border
        'right': 2,     # Medium border
        'align': 'center',
        'valign': 'vcenter',
        'bg_color': '#DDEBF7' # A light blue background
    })
    

    Apply the format to the header row (rewriting what Pandas initially wrote).

    1
    2
    
    for col_num, value in enumerate(df1.columns.values):
        worksheet.write(0, col_num, value, header_format)
    
  5. Define Formats for Data Cells

    General format for data cells (thin borders on all sides).

    1
    2
    3
    
    format1_all_borders = workbook.add_format({
        'bottom': 1, 'top': 1, 'left': 1, 'right': 1
    })
    

    Specific format for the first row of data (medium top border, thin other borders).

    1
    2
    3
    4
    5
    6
    
    format2_first_data_row = workbook.add_format({
        'top': 2,    # Medium top border
        'bottom': 1, # Thin bottom border
        'right': 1,  # Thin right border
        'left': 1    # Thin left border
    })
    
  6. Apply Conditional Formatting to Data Cells

    This applies ‘format2_first_data_row’ to the first row of data (row index 1). It checks if cells are not blank (criteria: ‘>=’, value: ‘””’ effectively means not blank for text/numbers).

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    
    if len(df1) > 0: # Ensure there is at least one data row
        worksheet.conditional_format(1, 0, 1, df1.shape[1] - 1, {
            'type': 'cell',
            'criteria': '>=', # Applies to cells with any content (numbers or text)
            'value': '""',    # Compares against an empty string
            'format': format2_first_data_row
        })
    
        # Apply 'format1_all_borders' to all data cells
        worksheet.conditional_format(1, 0, df1.shape[0], df1.shape[1] - 1, {
            'type': 'cell',
            'criteria': '>=',
            'value': '""',
            'format': format1_all_borders
        })
    
  7. Add an Excel Table with AutoFilter and Style

    The table range should include the header row (row 0) and all data rows. df1.shape[0] is the number of data rows, so the last data row index is df1.shape[0].

    1
    2
    3
    4
    5
    6
    7
    
    if len(df1) > 0:
        column_settings = [{'header': column} for column in df1.columns.values]
        worksheet.add_table(0, 0, df1.shape[0], df1.shape[1] - 1, {
            'columns': column_settings, # Use DataFrame headers for the table
            'autofilter': True,
            'style': 'Table Style Light 1'
        })
    

    Handle empty DataFrame: create table with only headers.

    1
    2
    3
    4
    5
    6
    7
    
    else: # Handle empty DataFrame: create table with only headers
        column_settings = [{'header': column} for column in df1.columns.values]
        worksheet.add_table(0, 0, 0, df1.shape[1] - 1, {
            'columns': column_settings,
            'autofilter': True,
            'style': 'Table Style Light 1'
        })
    
  8. Set Column Widths

    Adjust these character widths based on your data.

    1
    2
    3
    4
    
    worksheet.set_column('A:A', 10)
    worksheet.set_column('B:B', 22)
    worksheet.set_column('C:D', 11)
    worksheet.set_column('E:K', 8.5) # Example range
    
  9. Page Setup for Printing

    1
    2
    
    worksheet.set_landscape()      # Set page orientation to landscape
    worksheet.repeat_rows(0)       # Repeat header row (row 0) on each printed page
    
  10. Save and Close the Excel File

    1
    2
    3
    4
    5
    
    try:
        writer.close() # This also saves the file
        print("Excel file 'formatted_report.xlsx' saved successfully!")
    except Exception as e:
        print(f"Error saving Excel file: {e}")
    
This post is licensed under CC BY 4.0 by the author.