Daniyar Shekebayev / Oct 13 2020 / Published
Remix of Python by  Nextjournal
Nextjournal
Publishing pandas data frames to Tableau Online
via python environment
Based on this article by Eric Chan
First we need Tableau Extract API downloaded and saved in /results folder for further use
wget -q --show-progress --progress=bar:force -P /results https://downloads.tableau.com/tssoftware/extractapi-py-linux-x86_64-2019-2-6.tar.gz1.8s
modin-tableau (Bash in Python)
Next untar and install Tableau's python SDK, add pandleau, tableau server client and pandas, update conda packages
tar zxvf extractapi-py-linux-x86_64-2019-2-6.tar.gzcd hyperextractapi-py-linux-x86_64-release_2019_2.2019.2.6.199.r40e5865b/python setup.py installconda install -c conda-forge tableauserverclient modin pandaspip install pandleau --no-depsconda update -n base -c defaults conda330.7s
modin-tableau (Bash in Python)
export MODIN_CPUS=4export MODIN_ENGINE=rayexport MODIN_BACKEND=pandas0.5s
modin-tableau (Bash in Python)
How to use this runtime
Import the environment container with all the necessary packages

python-tableau-online
Download as Docker image from:
Copy
This image was imported from: docker.nextjournal.com/environment@sha256:5cf7d089d7a2d3a1955eb2f8a9f7259bc57f44cb9f0b8f1393aa97cbf8ce3639
Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors
import modin.pandas as pdtfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)df = pd.concat(tfr, ignore_index=True)print(df)2.1s
Python
python-tableau-online
Now let's create a publishtotableau Python function that publishes pandas dataframe as data source in Tableau Online so that we can later use it in code
import modin.pandas as pdimport tableauserverclient as TSCfrom pandleau import *def publishtotableau(df, folder_path, projectid, datasource_name, auth_list, site='yoursite'):    """    Login to Tableau Online and publish a pandas dataframe    Assumes the following pages are imported:        - tableauserverclient as TSC        - pandleau import *        - pandas as pd    Args:        df: dataframe to publish        folder_path: folder to store temp.hyper file generated        projectid: Tableau Server Project ID        datasource_name: Name of the datasource to publish        auth_list: List-like with username on index 0, password on index 1        site: Tableau server site    Returns:        None    """    pandleau(df).to_tableau(folder_path+'temp.hyper', add_index=False)    tableau_auth = TSC.TableauAuth(auth_list[0], auth_list[1], site_id = site)    server = TSC.Server('https://10ax.online.tableau.com/', use_server_version=True)    with server.auth.sign_in(tableau_auth):        mydatasourceitem = TSC.DatasourceItem(projectid, name=datasource_name)        item = server.datasources.publish(mydatasourceitem,folder_path+'temp.hyper', 'Overwrite')        print("{} successfully published with id: {}".format(item.name, item.id))0.0s
Python
python-tableau-online
Populate your dataframe with data first, e.g. a large csv file processed in chunks to avoid memory errors
import pandas as pdimport tableauserverclient as TSCfrom pandleau import *tfr = pd.read_csv("/path/to_csv", chunksize=500000, iterator=True)df = pd.concat(tfr, ignore_index=True)print(df)Python
python-tableau-online