3 Python functions to automate boring office tasks

The Financial Minimalist
5 min readJun 21, 2020
Source: Freepik.com

“Can you go through these Excel files, remove any rows that are duplicated in this other reference Excel file, and combine them into 1 Excel file?” Your manager asks, and you sigh in resignation at the mundane task ahead.

Sounds familiar? You are not alone — the average office executive spends over one-third of their time performing routine, administrative tasks, according to a global productivity study by Unit4. I’m sure I am not the only one thinking this, but surely we can spend our time more fruitfully to deliver value instead of wasting it on repetitive tasks!

Enter Python, which I am sure many of us would have heard of, but haven’t had the time to explore it. Some of us think that’s something for software engineers or data scientists to worry about, but really, Python is so versatile that everyone will benefit from it. For the record, I am a food chemist, and Python has led to a 6x increase in efficiency when I need to process data. I wouldn’t consider myself a programmer by any measure, and I know many of you who are reading this won’t consider yourself as one too. This is precisely why I wanted to provide this as a layman’s guide for you to get over the most mind-numbing tasks. Make a bookmark of this article, copy-paste the codes wholesale as you see fit, and you are good to go! Of course, if you are interested to find out more, I have provided an explanation (to the best of my ability) on what is being done with each line of code.

But first, download Jupyter notebook here, which is one of the most popular interfaces to code and execute Python scripts. Do note that all Jupyter notebooks run on a web browser such as Internet Explorer, but they are stored in your local hard drive and won’t require an internet connection to open, edit, or run.

You’ll also need to open command prompt and run these codes to make sure that all the packages are installed. The glob and os package (in fact, all the required packages) should already have been installed together with Jupyter notebook.

pip install win32com.client
pip install pandas
pip install imaplib
pip install email

1.Extracting Microsoft Excel/Powerpoint/Word/Outlook files from Microsoft Outlook messages (.msg files)

Microsoft Office files are what we call OLE (Object Linking & Embedding) files, and these may be extracted with only a few lines of code in Python as such:

import win32com.client,os,globdef extract_outlook(dir1,output_file_dir):     
outlook= win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
directory=os.chdir(dir1)
list1=glob.glob('*.msg')
for i in range(len(list1)):
text=dir1+list1[i]
msg1=outlook.OpenSharedItem(text)
for att in msg1.Attachments:
att.SaveASFile(os.path.join(output_file_dir, str(att.FileName)))
return None

The above chunk of code serves to generate a user-defined function, which we will need to apply subsequently as follows:

dir1=input('Please enter the file directory containing your email messages: ')
output_file_dir=input('Please enter the file directory where your attachments will be extracted to: ')
extract_outlook(dir1,output_file_dir)

Use case:
You’ll thank me later when you need to save all the attachments from all the emails since 5 years ago.

2. Combining multiple Excel files into a single file.
Note: All the excel files must have the same column names.

## importing packages
import pandas as pd
import glob,os
def excel_combiner(input_directory):
## build a list of all the excel files
os.chdir(input_directory)
list1=glob.glob(r'*.xlsx') ### or .xls where necessary
## extract the excel files sequentially and add it to the existing dataframe
df=pd.DataFrame()
for i in range(len(list1)):
df_x=pd.read_excel(list1[i])
filename=[list1[i]]*(df_x.index.tolist()[-1]+1)
df_x['Filename']=filename
df=pd.concat([df,df_x])
return df

Once the function is defined as above, copy-paste the block of code below to apply it.

input_directory=input('Please enter the file directory containing all your input files: ')
output_file=input('Please enter the desired filename of your output file: ')
df=excel_combiner(input_directory)
## output the dataframe back into Excel
df.to_excel(output_file)

Some example use-cases:
A) Combining system or instrument-generated logs into 1 single Excel file to locate the system fault (of course, you can do it within Python but that’s for another article).
B) Checking from many Excel files if any entries match with those in a reference Excel. (Wouldn’t it be easier to combine all of them into 1 Excel first, and do vlookup at one go?)

3. Tabulating your Gmail messages in an Excel file
Note: You need to let Gmail allow ‘less secure app access’ here and enable IMAP access here before you proceed.

You should see these once you have enabled less secure app access and IMAP access
## importing packages
import imaplib, email
def extract_email(email, password):
## logging in to Gmail through Python
m = imaplib.IMAP4_SSL("imap.gmail.com", 993)
m.login(email, password)
m.select('"[Gmail]/All Mail"')
## search all email and return uids
result, data = m.uid('search', None, "ALL")
## extracting all components of the email message
if result == 'OK':
for i in range(len(data[0].split())):
res,data1=m.uid('fetch', data[0].split()[i], '(RFC822)')
raw_email = data1[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
## extract body of email
for part in email_message.walk():
if part.get_content_type() == "text/plain":
body = part.get_payload(decode=True)
else:
continue
## building lists of email dates, sender email, addressee's email, cc emails, email subjects and email bodies
date1=[]
from1=[]
to1=[]
cc1=[]
subject1=[]
body1=[]
date1.append(email_message['Date'])
from1.append(email_message['From'])
to1.append(email_message['To'])
cc1.append(email_message['cc'])
subject1.append(email_message['subject'])
body1.append(body)
## piecing everything together in a table (dataframe)
df=pd.DataFrame()
df['Date']=date1
df['From']=from1
df['To']=to1
df['Cc']=cc1
df['Subject']=subject1
df['Body']=body1
return df

Again, once the function is defined, use the code below to apply it:

email=input('Please enter your email address: ')
password=input('Please enter your password: ')
output_file=input('Please enter your output filename: ')
df=extract_email(email,password)
## output the dataframe into Excel
df.to_excel(output_file)

Use case:
Let’s say your company has a Gmail account for customer feedback. This set of code will be incredibly useful if you need to visualize trends like how much customer feedback you are getting over different months, or you can even perform some text analysis to classify the feedback sentiment and find out the most raved or hated product or service (Again, that’s probably for another article). The possibilities are endless.

Well, I hope these 3 examples have helped you to save some agonizing hours and stay tuned for more productivity hacks with Python!

--

--

The Financial Minimalist

My 2 areas of passion are in personal finance and data analysis. My stories chronicle my journey towards Financial Independence and share tips on data analysis.