Wall Street's Transition from Excel to Python | Mito

Wall Street's Transition from Excel to Python

Ready to write Python code 4x faster?

No, Spreadsheets Are not Going Away

The common first question I get is “why do you think Python can eat Excel”? This is the wrong approach, in my opinion. Python is not going to eat Excel – it is too foundational (large parts of the global economy depend on spreadsheets – not without detriment). But other than market share, it’s just very good at a lot of things.

Spreadsheets are an amazing piece of software, arguably the most successful piece of software ever — Excel itself has over 750 million users. If they are going away, it is not on a time horizon worth prioritizing around. Excel and Google Sheets may lose some market share with new innovative spreadsheets, but if you are looking to tabulate simple data, track some records, or run simple models, Excel is still the platinum standard. In my experience, Python is best used as an augmentation to workflows, eating up the parts where Excel lacks.

Spreadsheet Pain points in Finance

I am explaining these pain points in a financial context, but people from all industries will recognize them.

Slow!

Financial analysts can spend hours a day waiting for models to load/refresh. This could be because of complex formulas or connections to external data sources. Spreadsheets are great when they are a nimble, flexible canvases for analysis – waiting for pivot tables to load is decidedly not nimble.

Can’t handle large data sizes

In finance, analysts are often presented datasets that don’t fit into a spreadsheet, so they need to pair it down. This eats up time, and in some cases diminishes the amount of the data taken into account for an analysis. It can also lead to repeated processes across subsections of a dataset, since the entirety can’t fit in a single spreadsheet.

Lack of repeatability

Spreadsheets can be overly manual and hard to automate. In finance, we see this with data validation work. Combing through a 20 tab spreadsheet to make sure certain values match is a prime process for automation, but VBA is lacking at best.

Python Takes Hold in Finance

Pandas, the most common Python library for data analysis, was developed by Wes Mckinney while at AQR capital, a hedge fund. Now pandas is a staple for financial analysis in Python.

Other great packages for Python financial analysis include:

  1. Pyfolio – perform risk analysis on financial portfolios (developed by Quantopian)
  2. Statsmodels – run statistical tests and data exploration on your data
  3. Zipline – an essential package for connecting to live exchanges for high frequency trading

With packages like these, we see quant teams grow in size at financial institutions, but a bifurcation grows between the Python based teams and the Excel based as well. Python is incredibly powerful, but it can be hard to pickup, which can make collaboration between these teams a challenge.

Bridging the Gap between Python and Excel

Fortunately, Python packages have come along that allow Excel users to start bringing their workflows to Python. Here is a write-up on which workflows are best to start transitioning.

Here are a few packages to consider as you begin to transition financial workflows to Python:

  1. Mito (I’m one of the creators)

Mito is a spreadsheet interface for Python. You can open Mito in your Python environment and each edit you make in the Mito spreadsheet will generate the equivalent Python. This is a great for spreadsheet users who want to start using Python, but don’t want to spend months learning the syntax.

  1. Openpyxl

This package makes reading and writing spreadsheets with Python much easier than using general packages for tabular data ingestion. You can manipulate your Excel file, such as adding and deleting rows, editing formulas, and using filters, all from your Python environment. You can also make new Excel workbooks from scratch. Imagine you wanted to programmatically create a new workbook three times a day — this is hard to automate in Excel, but Openpyxl makes it much more manageable with Python.

  1. Lux

This package is not spreadsheet based, but it easily allows non-technical users to create visualizations in Python. All the user does is pass in their dataset, and Lux auto-generates visualizations for that data, which the user can select and use in their notebook.

One important benefit of Python not discussed here, is the ability to directly access databases. I will cover that more in-depth in a later post :)

Ready to write Python code 4x faster?