With Lumipy, you can use the lumipy.provider
submodule to create data providers that connect Python data science applications to Luminesce.
Prerequisites and setup
Prerequisites:
.NET SDK at version 6.0 or above. Note for Mac users we recommend using an x86 version; read more.
Before you begin creating Python providers, ensure you have setup your config with a Personal Access Token (PAT). For Windows, we recommend you run Windows PowerShell as administrator to send the following command, replacing <your-domain>
and <your-access-token>
with your own values:
$ lumipy config add --domain=<your-domain> --token=<your-access-token>
Note: By default, providers are only visible to you. You can make a provider visible to all users in your domain by adding
--user==global
and--whitelist-me
to the above command.
Send the following command to run a provider and complete your setup…
$ lumipy run demo
…you should see provider information followed by a browser window opening, prompting you to sign in. After signing in, you should see the following:
Providers are ready to use.
Use ctrl+c or the stop button in jupyter to shut down
Building and running provider classes
Once you have completed the above setup, you can begin creating a Python provider. This consists of two steps:
Step 1: Building a provider class
To build a provider class you must first import the submodule:
import lumipy.provider as lp
All provider classes must be subclasses of lumipy.provider.BaseProvider
and implement the following methods:
__init__()
: Declare metadata such as the column and parameter content of the provider. These are built from the corresponding metadata objects underlumipy.provider
and supplied tosuper().__init__
.get_data()
: Produce the data that is returned by the provider. Takes in a limit value, a filter representation and parameter values from the query being processed, returning a dataframe.
Note: If you are creating a provider using a Pandas dataframe, you can use Lumipy's built-in
PandasProvider
class instead; see how to do this.
Step 2: Running the provider
To run a provider, use the command line interface (CLI) to run Python files containing provider objects, CSV files, and more; see how to do this. Note you can alternatively run an instance of the provider class in ProviderManager
. This starts up the provider, ready for querying in Luminesce; running via the CLI performs this step automatically.
Example
For example, you could build and run a provider for simulating a set of coin flips, with two columns Label
and Result
, and one parameter Probability
with a default value of 0.5
. To do this, you might:
Import the
lumipy.provider
submodule and other required packages for this example.Create a provider class that inherits from the
BaseProvider
class. This class should define the following:An
__init__()
method declaring the column and parameter metadata for the provider. This, alongside a name for the provider, is then supplied tosuper().__init__
.A
get_data()
method for getting and returning data from the provider. Thecontext
argument is used to specify information for querywhere
clauses, limits and parameters. For this example:The
limit
value for the number of rows to return is set to100
if a query doesn't specify a limit.Parameter values are retrieved from the parameters dictionary and must be within a specified range; an error is thrown in Python if the value is out of range and this is reported back in the progress log and query status.
The data - in this example coin flips - is generated and a dataframe is returned.
Instantiate the provider object using the
CoinFlips
class.
# 1. Import submodule
import lumipy.provider as lp
from pandas import DataFrame
from typing import Union, Iterator
import numpy as np
# 2. Create a provider class
class CoinFlips(lp.BaseProvider):
def __init__(self):
columns = [
lp.ColumnMeta('Label', lp.DType.Text),
lp.ColumnMeta('Result', lp.DType.Int),
]
params = [lp.ParamMeta('Probability', lp.DType.Double, default_value=0.5)]
# Supply above to super().__init__
super().__init__('example.coin.flips', columns, params)
def get_data(self, context) -> Union[DataFrame, Iterator[DataFrame]]:
# If no limit is given, default to 100 rows.
limit = context.limit()
if limit is None:
limit = 100
# Get parameter value from params dict and throw an error if the parameter is out of given range
p = context.get('Probability')
if not 0 <= p <= 1:
raise ValueError(f'Probability must be between 0 and 1. Was {p}.')
# Generate the coin flips and return.
return DataFrame({'Label':f'Flip {i}', 'Result': np.random.binomial(1, p)} for i in range(limit))
# 3. Instantiate the provider object
coin_flips = CoinFlips()
You can then save the code above as a Python file, coinflips.py
for example, and use the command line interface to run your Python provider on the fly:
$ lumipy run <path/to/>coinflips.py
You can run providers in this way for:
.py
files containing provider objects.csv
filesDirectories containing
.csv
and.py
files
Once running the provider, you can query it via Luminesce until you choose to shut it down:
Building providers for Pandas dataframes
You can use the built-in PandasProvider
class to easily pass a Pandas dataframe into the provider manager and make its data available in Luminesce.
To do this, simply input a dataframe object to the PandasProvider
class and run via the CLI. For example, to build a provider for one of the transaction source files from this tutorial, you might:
Read data into Pandas from a CSV file.
Pass the Pandas dataframe object and a provider name to the built-in
PandasProvider
class.Run the provider via the CLI.
# 1. Read data from CSV
import pandas as pd
import lumipy.provider as lp
transactions_df = pd.read_csv("data/transactions.csv")
# 2. Pass into PandasProvider class
my_df = lp.PandasProvider(transactions_df, 'transactions')
$ lumipy run <path/to/>myFile.py
Once running, this provider will appear as pandas.transactions
in Luminesce. You can query it until you choose to shut it down. Note you can set name_root = None
in PandasProvider
to change the provider name prefix.
If the input to PandasProvider
is not a dataframe, the constructor passes the value into pandas.read_csv
and uses the resultant dataframe. You can use this approach to build a provider from:
A local filepath
A URL
An IO stream
Anything else the function supports
For example, you could modify the code from steps 1-3 above to pass a CSV file into PandasProvider
before running via the CLI as usual:
import lumipy.provider as lp
file = "data/transactions.csv"
my_df = lp.PandasProvider(file, 'CreateDataframeForMe')
The provider runs as usual, allowing you to query the resultant dataframe via Luminesce:
Appendix A: .NET for Mac users
Mac users with an Apple silicon processor may experience the following error when attempting to run a provider locally:
Unable to load shared library 'SQLite.Interop.dll' or one of its dependencies.
You can follow these steps to overcome this issue:
Install Rosetta 2. You can do this via the command line:
$ softwareupdate --install-rosetta
Once installed, send the following command to locate Rosetta:
$ find / -name rosetta
Add the resultant path to your
$PATH
environment variable. Read the Apple documentation on how to do this.Install the x86 version of .NET 6.0 or above. To do this, you can download the .NET 6.0 x64 installer for macOS and copy to a folder on your
$PATH
environment variable. Alternatively, you can send the following command:$ arch -x86_64 brew install dotnet-sdk
You can send the following command to check the file path is correct, containing a x64 folder, for example /usr/local/share/dotnet/x64/shared/Microsoft.AspNetCore.App
:
$ dotnet --list-runtimes
If the path does not contain a x64 folder, uninstall .NET and reinstall the .NET x64 installer for macOS as mentioned in step 2 above. You should also ensure there are no .NET Arm64 versions on your path.