hops.featurestore_impl.featureframes package

Submodules

hops.featurestore_impl.featureframes.FeatureFrame module

class hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame(**kwargs)

Bases: object

Abstract feature frame (a dataframe of feature data to be saved to the feature store)

__init__(**kwargs)

Initialize state with information for reading/writing the featureframe

Args:
kwargs

key-value arguments to set the object variables

static get_featureframe(**kwargs)

Returns the appropriate FeatureFrame subclass depending on the data format

Args:
kwargs

key-value arguments with the featureframe state (must contain data_format key)

Returns:

FeatureFrame implementation

Raises:
ValueError

if the requested featureframe type could is not supported

abstract read_featureframe()

Abstract method for reading a featureframe as a training dataset in HopsFS Implemented by subclasses

abstract write_featureframe()

Abstract method for writing a featureframe as a training dataset in HopsFS Implemented by subclasses

class hops.featurestore_impl.featureframes.FeatureFrame.AvroFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Avro training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in avro format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the avro format

Returns:

None

class hops.featurestore_impl.featureframes.FeatureFrame.ORCFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for ORC training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in orc format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the orc format

Returns:

None

class hops.featurestore_impl.featureframes.FeatureFrame.TFRecordsFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for TFRecords training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in tfrecords format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the tfrecords format

Returns:

None

Raises:
ValueError

if the user supplied a write mode that is not supported

class hops.featurestore_impl.featureframes.FeatureFrame.NumpyFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Numpy training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in numpy format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

CouldNotConvertDataframe

if the numpy dataset could not be converted to a spark dataframe

NumpyDatasetFormatNotSupportedForExternalTrainingDatasets

if the user tries to read an external training dataset in the .npy format.

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the npy format

Returns:

None

Raises:
ValueError

if the user supplied a write mode that is not supported

NumpyDatasetFormatNotSupportedForExternalTrainingDatasets

if the user tries to write an external training dataset in the .npy format.

class hops.featurestore_impl.featureframes.FeatureFrame.HDF5FeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for HDF5 training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in hdf5 format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

CouldNotConvertDataframe

if the hdf5 dataset could not be converted to a spark dataframe

HDF5DatasetFormatNotSupportedForExternalTrainingDatasets

if the user tries to read an external training dataset in the .hdf5 format.

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the hdf5 format

Returns:

None

Raises:
ValueError

if the user supplied a write mode that is not supported

HDF5DatasetFormatNotSupportedForExternalTrainingDatasets

if the user tries to write an external training dataset in the .hdf5 format.

class hops.featurestore_impl.featureframes.FeatureFrame.PetastormFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Petastorm training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in petastorm format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the petastorm format

Returns:

None

Raises:
ValueError

if not petastorm schema was provided

class hops.featurestore_impl.featureframes.FeatureFrame.ImageFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for image training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in image format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the image format

Returns:

None

Raises:
ValueError

if this method is called, writing datasets in “image” format is not supported with Spark

class hops.featurestore_impl.featureframes.FeatureFrame.ParquetFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Parquet training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in Parquet format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the Parquet format

Returns:

None

class hops.featurestore_impl.featureframes.FeatureFrame.TSVFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for TSV training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in TSV format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the TSV format

Returns:

None

class hops.featurestore_impl.featureframes.FeatureFrame.CSVFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for CSV training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in CSV format from HopsFS

Args:
spark

the spark session

Returns:

dataframe with the data of the training dataset

Raises:
TrainingDatasetNotFound

if the requested training dataset could not be found

write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the CSV format

Returns:

None

Module contents