hops.featurestore_impl.featureframes package

Submodules

hops.featurestore_impl.featureframes.FeatureFrame module

class hops.featurestore_impl.featureframes.FeatureFrame.AvroFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Avro training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in avro format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the avro format

Returns:
None
class hops.featurestore_impl.featureframes.FeatureFrame.CSVFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for CSV training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in CSV format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the CSV format

Returns:
None
class hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame(**kwargs)

Bases: object

Abstract feature frame (a dataframe of feature data to be saved to the feature store)

__init__(**kwargs)

Initialize state with information for reading/writing the featureframe

Args:
kwargs:key-value arguments to set the object variables
static get_featureframe(**kwargs)

Returns the appropriate FeatureFrame subclass depending on the data format

Args:
kwargs:key-value arguments with the featureframe state (must contain data_format key)
Returns:
FeatureFrame implementation
Raises:
ValueError:if the requested featureframe type could is not supported
read_featureframe()

Abstract method for reading a featureframe as a training dataset in HopsFS Implemented by subclasses

write_featureframe()

Abstract method for writing a featureframe as a training dataset in HopsFS Implemented by subclasses

class hops.featurestore_impl.featureframes.FeatureFrame.HDF5FeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for HDF5 training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in hdf5 format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
CouldNotConvertDataframe:
 if the hdf5 dataset could not be converted to a spark dataframe
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the hdf5 format

Returns:
None
Raises:
ValueError:if the user supplied a write mode that is not supported
class hops.featurestore_impl.featureframes.FeatureFrame.ImageFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for image training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in image format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the image format

Returns:
None
Raises:
ValueError:if this method is called, writing datasets in “image” format is not supported with Spark
class hops.featurestore_impl.featureframes.FeatureFrame.NumpyFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Numpy training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in numpy format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
CouldNotConvertDataframe:
 if the numpy dataset could not be converted to a spark dataframe
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the npy format

Returns:
None
Raises:
ValueError:if the user supplied a write mode that is not supported
class hops.featurestore_impl.featureframes.FeatureFrame.ORCFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for ORC training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in orc format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the orc format

Returns:
None
class hops.featurestore_impl.featureframes.FeatureFrame.ParquetFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Parquet training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in Parquet format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the Parquet format

Returns:
None
class hops.featurestore_impl.featureframes.FeatureFrame.PetastormFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for Petastorm training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in petastorm format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the petastorm format

Returns:
None
Raises:
ValueError:if not petastorm schema was provided
class hops.featurestore_impl.featureframes.FeatureFrame.TFRecordsFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for TFRecords training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in tfrecords format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the tfrecords format

Returns:
None
Raises:
ValueError:if the user supplied a write mode that is not supported
class hops.featurestore_impl.featureframes.FeatureFrame.TSVFeatureFrame(**kwargs)

Bases: hops.featurestore_impl.featureframes.FeatureFrame.FeatureFrame

FeatureFrame implementation for TSV training datasets

__init__(**kwargs)

Initialize featureframe state using the parent class constructor

read_featureframe(spark)

Reads a training dataset in TSV format from HopsFS

Args:
spark:the spark session
Returns:
dataframe with the data of the training dataset
Raises:
TrainingDatasetNotFound:
 if the requested training dataset could not be found
write_featureframe()

Writes a dataframe of data as a training dataset on HDFS in the TSV format

Returns:
None

Module contents