Python Extract HTML Table (Convert to Pandas DataFrame) Tutorial
Examine the HTML
Use Best HTML Viewer, HTML Beautifier, HTML Formatter and to Test / Preview HTML Output (codebeautify.org) beautifier to view html.
We can simply use Pandas.read_html() to read the tables inside a given html.
If you ever faced the problem
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 4204: illegal multibyte sequenceSimply add a parameter
encoding="utf-8"to theopen.1
But, what if we have a HTML body that has nested tables.
| |
We can play with the string by finding the n-th occurence '<table' to filter out the unwanted <table>. Then use the header parameter to anchor the right header.
Example:
| |
But how can we transform the table to the format we want?
Transpose/Transform

Let’s ignore the complex DataFrame, transpose things. A simple and intuitive approach will be loop through the DataFrame and Create a new DF.
Like the above tables, I’ve written an example code you can refer to.
| |
How about the datetime? How can we handle the local datetime issue if we want to deploy the app to the cloud.
Time Zone
With classmethod datetime.now(tz=None)2, we have tzinfo to get the certain local time. Though the standard library does not define any timezones – at least not well (the toy example given in the documentation does not handle subtle problems like the ones mentioned here).3
My suggestion is to use timedelta to change the local time from utcnow() instead. For example we want the local time to be fixed to Taipei time (utc+8). We can just use timedelta:
| |
