Finance is an industry highly focused on efficiency. Just look at high frequency trading. Milliseconds and Millimeters are key to making trades faster. Python is at the forefront of not just trading but generally making financial services, fintech, any industry tangential to finance run smarter and faster. If we look back, spreadsheets have been the key piece of software in finance. They are the best way to visualize financial data, and they allow the users to see relationship in their data better than any other software.
Another major plus of spreadsheets is the open environment for analysis. Spreadsheet analysis can range from extremely structured — one large ledger – to a series of interconnected tables aimlessly placed across the worksheet. It is a perfect environment for scratchwork. Spreadsheets provide a flexible environment regardless of if you are doing ad hoc analysis all the way to production level financial models. Yet, spreadsheets do have some major downsides that drive this need for up skilling, which I will delve further into below.
The Benefits of Learning Python in Finance
The first benefit is that Excel users in finance can use Python to access larger data than they can in Excel. For example, If you're using Excel, you'll get limited at just over 1 million rows of data, but as I'm sure you've experienced, even at 50,000 rows, a spreadsheet can lag and become unusable. Finance often relies on large datasets, and they are getting larger. For example, a lot of banks are started to centralize all their different datasets in Snowflake database. This is great because you can combine things like industry data, user data, training data all together and join the datasets together when needed. It also helps with the security and permissioning of access to certain datasets. The issue is that much of this data cannot be accessed in a spreadsheet. Connecting a spreadsheet to a database is a challenge and getting the data down to a size, where it can fit is not easy as well.
Data analysis in Python can easily access large datasets – in fact it was made for it. Pandas, the most common Python package for data science, was created by Wes Mckinney, while he was at AQR Capital, a hege fund. The was that there should be a Python package that allows users to interact with millions of lines of data in a tabular mode. Pandas and other Python packages consistently handle millions of lines of data with no issue.
Not only can Python access these larger datasets but it can do so quickly. In finance, spreadsheet users are consistently waiting on slow spreadsheet to load. Python can process these same data sizes way faster, leading to less process speed delays and ability to get real time insights in actual real time.
Key applications of Python in finance and fintech
One of the main ways Python can be used in finance is for the automation of reports. Let's take a stock portfolio performance report. This final form of this report might just be a large pivot table. If the pivot table is aggregating on 10s of thousands of records, it could take a considerable amount of time to load each time it needs to be updated. In Python, this report could set to update automatically on the day, hour or minute, and consistently pull in real time data when it does. Certainly, there are processing speeds in Python as well, but these are negligable compared to a spreadsheet. Automation is a benefit for efficiency reasons, but also because it is provides one centralized source of truth. Financial firms with reporting processes that run on spreadsheets rarely run on just one spreadsheet. Different components and data sources for the report exists in different spreadsheets that are sometimes rather losely connected. Python brings everything to the same environment:
- Connect to your data sources
- Clean you data
- Transform your data
- Visualize your data
All in one environment and process.
Python also allows for much more rigorous testing and checking. A common group of spreadsheet workflows can be called data validation. Data validaiton is made up of data checks that one uses to search for discrepencies or anomolies in a dataset or an output. For example, you have a DCF model and you want to check that the Net Debt is the same in the input tab as it is in the DCF tab. This is a rather simple check that human eyes can do by clicking through the tabs. But now imagine it is end of quarter and you need to do this check on 80 models. Suddenly you have a day's worth of work on your hands. A python script could run this check across all the models instantly. Certainly it would take some to create the script. Here is one way of doing it:
- import pandas and numpy
- upload both datatables and dataframes (use Openpyxl – as Python package that reads Excel files intelligently to specific the ranges for the tables)
- Join the tables on the value titles
- create a helper column with a conditional statement that returns true if the values match and false if they don't
This process might take a few hours to code correctly (below we'll dive into some tools that will make this go faster), but the next quarter you won't have to spend any time, just pass in your new files and it will return the outputs of your checks.
Why Up Skilling is Important
Alot of people will read the above and ask "Why don't financial institutions just hire Python programmers?" The short answer is – they are. But they are also training their Excel based employees on Python – which is a smart decision for a few reason. The main reason is that Python is not going to take over all Excel processes. Python will continue to a helpful tool for financial analysts, but it will not dominate their workflows. Up Skilling through trainings and Python packages made for spreadsheet users.
Up skilling allows financial analysts to make the most of their spreadsheet skills while, supplementing in Python in the places that add the most value. It is also important to note that have a strong financial background is invaluable when you are completing financial tasks in Python. The person creating the data check needs to understand why these checks are important and what values to be checking.
How to Transition from Excel and Sheets to Python
I will give a brief overview, but you can find a much more thorough explanation here.
There are two fronts here. There is learning Python and there is taking Excel processes and bringing them to Python. When it comes to learning Python, I've found that being domain specific is best way to get the training to stick. Get to the value fast!
Here are the things you'll want to start with in the training:
- An introduction to Jupyter and how to access your Python environment
- How to add data to your Python environment (especially Excel and CSV files)
- How to clean data (drop null values, edit strings, etc.)
- How to transform data (groupby's/pivot tables, VLOOKUPs etc. )
- How to visualize data (Matplotlib, Plotly)
It is important to show the user normal spreadsheet tasks as early as you can. Here they can directly see the benefits compared to their normal processes.
When it comes to using Python to automate Excel tasks, there are a few key things to consider. Firstly, you're not going to take a full LBO or DCF and make it fully Python – atleast not for a while. You want to find a task that is mostly tabular. This is because it matches the Pandas Dataframe data structure – rows and columns.
Another key part of transitioning finance users from spreadsheets to Python is providing them the right packages to do so.
Mito is a Python package that let's the user's access Python, while editing a spreadsheet. You can call Mito into your Python environment and edit your data as if it were in Excel or Sheets, but each edit will generate the equivalent Python.