In April last year, we unveiled the Edge Impulse Python SDK to help programmers and machine learning (ML) practitioners interact with Edge Impulse by uploading and profiling custom ML models. Today, we are announcing a new set of features for the Python SDK that allow you to upload and download data directly to and from your Edge Impulse projects. This should help make data uploading, augmentation, and synthesis easier!
The fastest way to get started with the new features is to check out this Jupyter notebook tutorial. You can see the latest release here on PyPI.
Before we provide details for these new functions, we want to describe how features will be added to the Python SDK going forward.
Experimental submodule
Rather than simply release new features and hope for the best, we are emulating Google’s success with the TensorFlow package by adding an “experimental” submodule. By releasing new features under this experimental submodule, we can provide a form of beta testing. We want your feedback on the usefulness of new features: please let us know if you find any bugs, how you enjoy working with the Python SDK, and if you have any suggestions for improvements!
Any new features in the future will be released in this “experimental” submodule. If you have not done so already, install (or upgrade) the Edge Impulse Python SDK:
pip install edgeimpulse
In your Python program, simply import the SDK and set the API key from your project:
import edgeimpulse as ei
ei.API_KEY = "ei_3881..."
All pre-release features can be found in the “experimental” submodule. For example, here is how you might upload a directory to your Edge Impulse project:
ei.experimental.data.upload_directory(directory=”dataset”)
While we strive to release code that is thoroughly tested and reviewed, the interface (i.e. the function names, arguments, and return values) in the experimental submodule might change before the full release. Once a set of features is moved out of “experimental,” the interface should not change. This is where you come in: help us make the Python SDK better by letting us know about any issues or requests you may have on the Edge Impulse forums.
Data upload
Dealing with data is often a painful process. Capturing, labeling, storing, and formatting can require custom engineering efforts and often elicits many headaches. Out of the box, Edge Impulse offers myriad ways of moving data from your device to a project:
- Collecting data directly from a fully supported development board
- Uploading data directly into Edge Impulse with the graphical user interface
- Using the CLI-based data forwarder to upload data
- Scripting data uploads using the ingestion API
The Python SDK now supports data uploading for a variety of data formats, including supported files, NumPy arrays, and Pandas DataFrames. While you have several options to get data into your project, the Python SDK adds another avenue to make this process easier.
To upload all files in a directory named “dataset,” you can call the following function. Note that the files must be in one of the formats accepted by the Edge Impulse ingestion service.
resp = ei.experimental.data.upload_directory(directory=”dataset”)
print(resp)
You can also upload individual files directly by opening file handles, wrapping them with some metadata (such as the label and category), and passing them to the upload_samples() function. For example, let’s say you have some CSV files named “001.csv” and “002.csv” that contain some time series data:
# Add metadata to the open file handles
my_samples = [
{
"filename": “001.csv”,
"data": open(“001.csv”, "rb"),
"category": "training",
"label": "idle",
"metadata": {
"source": "accelerometer",
"collection site": "desk",
},
},
{
"filename": “002.csv”,
"data": open(“002.csv”, "rb"),
"category": "training",
"label": "wave",
"metadata": {
"source": "accelerometer",
"collection site": "desk",
},
},
]
# Wrap the samples in instances of the Sample class
samples = [ei.experimental.data.Sample(**i) for i in my_samples]
# Upload samples to your project
resp = ei.experimental.data.upload_samples(samples)
print(resp)
Similarly, you can upload NumPy arrays directly. However, note that the shape of the array should be (num_samples, time_point, num_sensors). Once again, you need to provide some metadata, such as the name of your sensors and sample rate (assuming time series data).
import numpy as np
# Create 2 samples, each with 3 axes of accelerometer data
samples = np.array(
[
[ # sample 1
[8.81, 0.03, 1.21],
[9.83, 1.04, 1.27],
[9.12, 0.03, 1.23],
[9.14, 2.01, 1.25],
],
[ # sample 2
[8.81, 0.03, 1.21],
[9.12, 0.03, 1.23],
[9.14, 2.01, 1.25],
[9.14, 2.01, 1.25],
],
]
)
# The labels for each sample
labels = ["up", "down"]
# The sensors used in the samples
sensors = [
{"name": "accelX", "units": "ms/s"},
{"name": "accelY", "units": "ms/s"},
{"name": "accelZ", "units": "ms/s"},
]
# Upload samples to your Edge Impulse project
resp = ei.experimental.data.upload_numpy(
sample_rate=100,
data=samples,
labels=labels,
category="training",
sensors=sensors,
)
print(resp)
[Edit Jan 22, 2024] Finally, we aim to support DataFrames from various data analysis packages, such as pandas, Dask, Polars, and Modin. Under the hood, we use duck typing to see if certain functions are available in the packages or objects. Currently, we support pandas and Dask (so long as you import them as “pd”).
In future releases, we plan to add support for more data packages. Please let us know on the forums which packages you would like to see support for!
# Uncomment one of the following
# import pandas as pd
# import dask.dataframe as pd
# import polars as pd
# Construct non-time series data, where each row is a different sample
data = [
["desk", "training", "One", -9.81, 0.03, 0.21],
["field", "training", "Two", -9.56, 5.34, 1.21],
]
columns = ["loc", "category", "label", "accX", "accY", "accZ"]
# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)
# Upload samples to your Edge Impulse project
resp = ei.experimental.data.upload_pandas_dataframe(
df,
feature_cols=["accX", "accY", "accZ"],
label_col="label",
category_col="category",
metadata_cols=["loc"],
)
print(resp)
DataFrames can store data in a variety of ways, including one sample per row, a 1D time series per row, a multidimensional time series for each DataFrame, and so on. We support uploading several of these formats. See the example Jupyter notebook for additional information on DataFrames in the Jupyter notebook tutorial.
Data download
In addition to uploading data, you can also download individual samples from your project. Note that to download your entire dataset, we recommend clicking on the “Export” tab in your project’s dashboard. This will archive your entire dataset and download a .zip file to your computer, and it is much faster than downloading individual samples.
All samples in an Edge Impulse project are assigned a unique ID number and a filename derived from the original uploaded filename. Note that filenames do not have the original extension, and multiple samples can have the same filename. As a result, we recommend giving your files unique filenames to make searching them easier.
To download samples, you will first need to obtain the sample IDs and then download the associated samples. Note the lack of file extension in the filename! Assuming you uploaded only one file named “001.csv,” then the following should download just that sample:
# Obtain sample IDs given a filename
infos = ei.experimental.data.get_sample_ids(filename="001")
ids = [info.sample_id for info in infos]
# Download samples
samples = ei.experimental.data.download_samples_by_ids(sample_ids=ids)
print(samples)
Getting started
The best way to get started using the new data features is to visit our example Jupyter notebook. Feel free to run the notebook locally or in Colab to see how everything works.
The Python SDK API reference guide can be found here.
Enjoying the Python SDK? Found a bug? Want to make a suggestion? Let us know on the forums!