Daniyar Shekebayev / Oct 13 2020 / Published
Remix of Python by Nextjournal
Publishing pandas data frames to Tableau Online
via python environment
Based on this article by Eric Chan
First we need Tableau Extract API downloaded and saved in /results folder for further use
wget -q --show-progress --progress=bar:force -P /results https://downloads.tableau.com/tssoftware/extractapi-py-linux-x86_64-2019-2-6.tar.gz
1.8s
modin-tableau (Bash in Python)
Next untar and install Tableau's python SDK, add pandleau, tableau server client and pandas, update conda packages
tar zxvf extractapi-py-linux-x86_64-2019-2-6.tar.gz
cd hyperextractapi-py-linux-x86_64-release_2019_2.2019.2.6.199.r40e5865b/
python setup.py install
conda install -c conda-forge tableauserverclient modin pandas
pip install pandleau --no-deps
conda update -n base -c defaults conda
330.7s
modin-tableau (Bash in Python)
export MODIN_CPUS=4
export MODIN_ENGINE=ray
export MODIN_BACKEND=pandas
0.5s
modin-tableau (Bash in Python)
How to use this runtime
Import the environment container with all the necessary packages
python-tableau-online
Download as Docker image from:
Copy
This image was imported from: docker.nextjournal.com/environment@sha256:5cf7d089d7a2d3a1955eb2f8a9f7259bc57f44cb9f0b8f1393aa97cbf8ce3639
Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors
import modin.pandas as pd
tfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)
df = pd.concat(tfr, ignore_index=True)
print(df)
2.1s
Python
python-tableau-online
Now let's create a publishtotableau
Python function that publishes pandas
dataframe as data source in Tableau Online so that we can later use it in code
import modin.pandas as pd
import tableauserverclient as TSC
from pandleau import *
def publishtotableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):
"""
Login to Tableau Online and publish a pandas dataframe
Assumes the following pages are imported:
- tableauserverclient as TSC
- pandleau import *
- pandas as pd
Args:
df: dataframe to publish
folder_path: folder to store temp.hyper file generated
projectid: Tableau Server Project ID
datasource_name: Name of the datasource to publish
auth_list: List-like with username on index 0, password on index 1
site: Tableau server site
Returns:
None
"""
pandleau(df).to_tableau(folder_path+'temp.hyper', add_index=False)
tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id = site)
server = TSC.Server('https://10ax.online.tableau.com/', use_server_version=True)
with server.auth.sign_in(tableau_auth):
mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)
item = server.datasources.publish(mydatasourceitem,folder_path+'temp.hyper', 'Overwrite')
print("{} successfully published with id: {}".format(item.name, item.id))
0.0s
Python
python-tableau-online
Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors
import pandas as pd
import tableauserverclient as TSC
from pandleau import *
tfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)
df = pd.concat(tfr, ignore_index=True)
print(df)
Python
python-tableau-online