Automating Spreadsheets with Python 101 | Mito

Automating Spreadsheets with Python 101

Ready to write Python code 4x faster?

Looking for a bit of automation, eh? We get it. That weekly_report.xlsx sheet started simple enough, with just a data pull tab and a calculations tab. Things spiraled from there. Now you’re at twenty tabs, and while tab growth isn’t slowing down, that spreadsheet certainly is. You have to update the damn thing manually every week, which takes hours you’d rather spend literally anywhere else.

In the past two years of building a spreadsheet that writes Python code for you as you edit, we’ve helped hundreds of folks transition their spreadsheet reporting processes from a massive-and-totally-gross weekly_reports.xlsx to fully automated Python scripts. I’ll share the most important lessons we’ve learned from these automations:

  1. What processes are good targets for automation with Python
  2. How to get your teammates onboard with automation
  3. How to start a Python automation
  4. How to take a Python automation over the finishline

This blog post is written for folks who’ve never used Python before but are curious how to start using it to automate their data work, and save hours a week they would normally be updating spreadsheets. Read on if that is you!

Not the sort of Python we'll be using.

What makes a good target for automation?

With the exception of the internet, spreadsheets are the most popular user-facing software in the world. From company valuation models to monthly reporting tasks to entire video games (the horror) to data entry forms — some spreadsheet use cases are better suited for Python automation than others.

Here’s how to tell the difference between a good and bad Python automation target.

Signs a Spreadsheet is Not a Great Fit for Automation

  1. Complex financial models: sorry to be the bearer of bad news, but trying to reimplement your LBO model in Python is something you will probably regret. The complex interplay of hundreds of variables results in Python scripts that are even less maintainable than most of their Excel counterparts. Unless it’s a simple model, you and your teammates are probably better sticking with Excel.
  2. Data entry portals: if other end users need to interact with your Excel document to enter data, then switching to Python can be tricky. We’ve seen Excel used for data entry in all sorts of places, including a 13-division international company using Excel to standardize their financial statements. The common story is that that Excel is the lowest common denominator for data entry - so having to rebuild this shared infrastructure can add months onto a project! If the Excel is primarily used for data entry, it’s not the best automation use case.
  3. Lots of distinct tabs: if you have an Excel file with 10 tabs, each of them very different from the one before, then you’ll need to automate each of them seperately with Python. Of course, this can still be worth it, but as we’ll explore below, a spreadsheet with lots of similar tabs allows Python to really shine!

None of these signs are immediately disqualifying, but the Excel files described above end up being a lot harder to automate, and result in lower return on investment. Try and balance these with what makes an Excel file good for automation below!

Good for Python Automation

  1. Repeated reporting tasks: most obviously, if you find yourself returning to an Excel document on a daily, weekly, or monthly basis - it’s likely a good target for automation. Especially if you’re performing very similar actions whenever you return to the document.
  2. Your Excel file is suffering: you find yourself looking at the .xlsx file with anger and distain, while also bragging to your friends about that “one insane Exel file” you have. I recently helped a Mito user automate an Excel file that took 10 minutes to open, 10 minutes for every edit, and would refuse to save most of the time, leading to one of the most painful spreadsheet editing processes I’ve ever had. If Excel can’t handle it, it might be time to upgrade to Python.
  3. Excel functonality limits: if you want to do some more complex operations that Excel does not support, then it might be time to move to Python. In practice, we’ve seen users who are moving to Python so they can take advantage of it’s multitude of ML libraries, easier integration with other data sources, and more.
  4. Repeated/similar tabs: imagine an Excel with the following tabs: input_data, Jan 2020, Feb 2020, …, Dec 2020, …. In each case, the month tab corresponds to the same sort of filtering and transformations of the input data. In cases like this, Python can really shine. By writing some Python code that can handle one month, you can then turn it into a function that handles all months. As a result, making a 100 tabs is as easy as making 2!

Getting your teammates onboard with automation

Now that you know the Excel process you’re automating, let’s get started writing Python! For a non-technical team, automation can be new and off-putting, and so it makes sense to get your teammates onboard before you start. Here are the most effective ways we’ve found of getting your team bought in to your automation:

  1. Communciating the cost of non-automation. This is the most important step. If you can communicate to your teammates how much time/pain the current spreadsheet process is taking, you’ll have an easier time convincing your team that automation is worth considering! Take the time to paint a picture about what a successful automation could look like as well - so your teammates can picture the finish line. From “this takes X hours a week” to “we always have updated data.”
  2. Setting expectations about the automation process. Automation never goes as smoothly as one would like, especially if it’s your first time. Of course, this doesn’t mean it isn’t worth it, it just means you should help your teammates understand that there’s some uncertainty, and while the potential benefits are high, they should expect some bumps in the road.

Practically, this means you should probably try and avoid giving your teammates aggressive estimates about how long this will take: it will probably take longer than you expect! Sharing this article with your teammates is one way to set shared expectations about the automation process.

Getting started: your first Python automation

The Target Process

Great, the team is onboard! Let’s imagine you’ve decided to automate a weekly reporting process that has the following shape:

  1. transactions: this tab is the first in the Excel sheet. It contains all transactions ever made by the marketing division at your company. You update this by running a SQL query in your company financials database, and then copying the new rows here.
  2. campaigns: this tab contains all current and historical marketing campaigns the marketing division has ever run, and how they were performing. This is pulled from your CRM software in a similar way to above.
  3. Jan 1 2020: this tab pulls data from transactions, and calculates some summary metrics about the transactions. It also pulls in campaigns running during this week, and reports some metrics about how they were performing with a few pivot tables.
  4. Jan 8 2020: same as above, but on a new week!
  5. Feb 1 2022: same as above, but updated for a new week. There are some additional statistics you have to add here for month end as well!

There’s only one problem: you’re running out of space in the transactions tab. And there are >100 tabs, making the Excel very slow. It’s time to automate!

Step 1: Complete a First Target

The first step is to select a small piece of the Excel to get started with. The key is to pick the smallest possible task that still moves you towards the goal of full automation. Because you’re new to Python , you’ll only be hurting yourself by being ambitious. Just automate anything to get started.

In my experience doing these automations, there are usually two good targets: a data pull tab, or a single transformation tab. Indeed, it may even make sense to start by trying to replicate a single pivot table on a single transformation tab.

In our case, either automating the SQL pull for the transactions tab or the campaigns tab is both doable and relatively easy in Python. This is probably a good first target!

Make sure to select an automation target that you’re confident the Excel file calculates correctly. Checking the accuracy of Excel and Python work at the same time is much harder than just checking one at once — so make sure you’re confident in the Excel work you’re replicating!

Step 2: Recon Target

Now that you’ve automated this first tab, take some time to and check it.

Practically, this means using Python to read in your Excel document, and then comparing the Excel results with the result of your calculations. In our example, compare the results of your transactions SQL pull to the data in the transactionstab in the Excel. You’ll probably notice you forgot to change the datatype of a string to a number or something similar — and then fix it up!

By selecting a small target in step 1, this recon should not be complicated. And as a result of this recon, you can confidently move on to the next step knowing that you are finished with this one — something that is almost impossible if you have a bunch of unfinished steps you’re trying to combine together!

This is also a great place to update your teammates on your progress. They might rely on the Excel file you’re automating, or you might be relying on them to explain some of it’s details. In either case, keeping them updated on your progress helps everyone stay bought in on the automation.

Step 2.5: Cleanup

Before you call the recon done, clean your script up! You probably will have lots of random print statements, badly named variables, code that you have to change when you rerun it, and much more.

Take a few moments and clean up the script. Remove unnecessary elements, write comments that explain why/how a complex chunk of code works the way it does, rename variables to be more informative, and generally try and support yourself as you continue the automation over the following weeks!

Make sure your cleanup doesn’t break anything!

Step 3: Repeat!

Select a small piece of the process. Automate it. Recon it. Combine it with the other automated steps in the process. Check them together. And continue.

Once we automate the transactions pull and check it, then we can move on to the campaigns pull. Then we can move on to Jan 1 2022 tab. Once we recon that, we can then move to the next week - trying to generalize our code so it works for both tabs!

This iterative process makes it so you can have confidence that you automation is working as expected, and makes it much easier to debug when things go wrong.

Step 4: Present your Automation to the Team

And with that, you’re done! You have a Python script that automatically does what used to take hours of manual work.

Now it’s time to share this work with your team. Communicating effectively about an automation gives your teammates a chance to acknowlege your work, buys you more opportunities for future automations, and can inspire others on your team to do the same. … and maybe get you a promotion!

In presenting your automation to your team, we’ve found that it’s best to focus on the costs to the team of no automation and the benefits of the Python code you’ve written. For the Python curious, you can give them a peek at the code, too :)

Other automation project management tips

  1. If you want to start with a calculation tab as the first automation target, you can use the Excel workbook as your intial data source. This can be a very effective way to skip annoying database permission requests within your company and get to the calculation part of the automation!
  2. Continue using the old Excel until the Python process is finished: the only thing worse than a broken automation is a broken automation that someone is waiting on. Bugs have a tendancy to pop up in the final moments — so before getting rid of a critical old process, ensure that the automation is entirely, 100% finished!
  3. Don’t change the process as you’re automating it: it’s easy to go mad with power, and Python can be so efficient that it can be tempting to add to the process. I have seen more issues as a result of changing the process while automating it than anything else. Resist this at all costs. First, finish the automation - then, change it however you want!
  4. Recon the full process when you’re 100% finished: this one is obvious, but check your work! The two tips above will make this much easier as well.

Using Mito for Python Automation

None of this blog post is specific to Mito! Rather, all of these tips apply no matter how you’re choosing to write your Python script.

However, at Mito, we’ve taken the years of experience we have helping users automate spreadsheet processes and built a tool to make this easy. It’s called Mito — and it’s a spreadsheet that generates Python code for you as you edit.

You’ll notice there’s an implicit step inbetween selecting a first target for automation and reconing it where you actually have to write the code! Before Mito, you were responsible for taking a Python training course, or perhaps getting lost in stack overflow for weeks. With Mito, you can just edit a spreadsheet how you’re used to — and Mito will write production-ready Python code for you! Then, you can take this Python code and continue to edit it without needing Mito at all.

Ready to write Python code 4x faster?