Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520

hjfrank1991 · 2020-10-19T14:22:06Z

when i used iris.csv data:

1,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa

so i create StructType like this:

    val schema = StructType(
      Array(
        StructField("id", IntegerType, nullable = false),
        StructField("sepalLength", DoubleType, nullable = false).withComment("feature"),
        StructField("sepalWidth", DoubleType, nullable = false).withComment("feature"),
        StructField("petalLength", DoubleType, nullable = false).withComment("feature"),
        StructField("petalWidth", DoubleType, nullable = false).withComment("feature"),
        StructField("irisClass", StringType, nullable = false).withComment("label")
      )
    )

next i get label col and feature col:

val dataFrame = ...
val name = "irisClass"
val (irisClass, predictors)  = FeatureBuilder.fromDataFrame[Text](dataFrame, response = name)

id isn't label and feature when use this it means id is also a feature col , but i don't want this;
so i select cols comment is label or feature and then i drop other cols

val frame = dataFrame.drop("id")
val (irisClass, predictors)  = FeatureBuilder.fromDataFrame[Text](frame, response = name)

// Extract response and predictor Features
val (survived, predictors) = FeatureBuilder.fromDataFrame[Text](dataFrame, response = name)

// Automated feature engineering
val featureVector = predictors.transmogrify()

// Automated feature validation and selection
val index = survived.indexed("__unknown", StringIndexerHandleInvalid.Keep)

val checkedFeatures = index.sanityCheck(featureVector, removeBadFeatures = true)

val pred = MultiClassificationModelSelector
  //.withCrossValidation()
  .withTrainValidationSplit()
  .setInput(index, checkedFeatures)
  .setOutputFeatureName("pred")
  .getOutput()

// Setting up a TransmogrifAI workflow and training the model
val model: OpWorkflowModel = new OpWorkflow()
  .setInputDataset(frame)
  .setResultFeatures(pred)
  .train()

// save
model.save(path = "/model/automl", overwrite = true)

// load
val loadmodel = OpWorkflowModel.load("/model/automl")

// getAllFeatures
val features = loadmodel.getRawFeatures().map(_.name)

// use model to predict new data 
// Changing the order of columns
val frame3 = frame. select(features.head, features.tail: _*)
val dataFrame1 = loadmodel.setInputDataset(frame3)
  .score()
dataFrame1.show(false)

but get bug:

Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String
	at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146)

The text was updated successfully, but these errors were encountered:

hjfrank1991 · 2020-10-19T14:24:53Z

if i change this:

val dataFrame1 = loadmodel.setInputDataset(frame)
.score()
dataFrame1.show(false)

it’s ok so when i use model to predict data i cann't change the order of columns ？

tovbinm · 2020-10-19T20:31:35Z

In your example you seem does not seem to be using the frame you created. Try this:

// Drop id column
val frame = dataFrame.drop("id")

// Extract response and predictor Features
val (irisClass, predictors) = FeatureBuilder.fromDataFrame[Text](frame, response = "irisClass")

// Automated feature engineering
val featureVector = predictors.transmogrify()

// Automated feature validation and selection
val index = irisClass.indexed("__unknown", StringIndexerHandleInvalid.Keep)
val checkedFeatures = index.sanityCheck(featureVector, removeBadFeatures = true)

val pred = MultiClassificationModelSelector
  .withTrainValidationSplit()
  .setInput(index, checkedFeatures)
  .setOutputFeatureName("pred")
  .getOutput()

// Setting up a TransmogrifAI workflow and training the model
val model: OpWorkflowModel = new OpWorkflow()
  .setInputDataset(frame)
  .setResultFeatures(pred)
  .train()

val scored = model.setInputDataset(frame).score()

scored.show(false)

hjfrank1991 · 2020-10-19T21:36:49Z

sorry ！write mistake。。。 this
in idea is right

// Extract response and predictor Features 
val (survived, predictors) = FeatureBuilder.fromDataFrame[Text](frame, response = name)

you example is right but when i change this frame ( change the order of columns rename frame_new) and then use model predict then have bug：

val scored = model.setInputDataset(frame_new).score()

so we predict data should keep the order of columns????

hjfrank1991 · 2020-10-19T21:43:10Z

and we can use this like sparkml pipeline example：

val (irisClass, predictors1) = FeatureBuilder.fromDataFrame[Text](dataFrame, response = name)
val strindex = new OpStringIndexer()
  .setInput(irisClass)
  .setOutputFeatureName("index")

val strModel = strindex.fit(dataFrame)
val mm = strModel.getSparkMlStage() match {
  case Some ( x ) => x
}

val opdt = new OpDecisionTreeClassifier()
  .setInput(strindex.getOutput(), featureVector1)
  .setOutputFeatureName("dtPred")

val labels = mm.labels

val inde = new OpIndexToString()
  .setInput(strindex.getOutput())
  .setLabels(labels)
  .setOutputFeatureName("pred")

val pipelineModel = new Pipeline("getAlgorithmType")
  .setStages(Array(strindex, opdt, inde))
  .fit(dataFrame)

do you have example like that?

tovbinm · 2020-10-19T22:59:46Z

We never tried resorting to the columns. In general, this should not be an issue since we refer the columns by their names. Why would you need to do it?

Transmogrify stages can be used in Spark ML pipelines as long as you maintain the naming conventions on the columns.

hjfrank1991 · 2020-10-20T00:25:40Z

When we train the model, we use this model again to predict a batch of data, but the column order of this batch of data is different, and the column names are the same. If the order of the data columns read by the model cannot be changed, this reduces the generality

tovbinm · 2020-10-20T03:50:18Z

OK, I just went through the code. Each Feature that was constructed from a Dataframe Row has an index property which is used to locate the feature column in each row.

One option I see to overcome this is to recreate the features prior scoring using the new dataset, then use them as input for the model.

hjfrank1991 · 2020-10-20T11:29:50Z

I don't quite understand; use new data sets to create features and then use the original model to predict

hjfrank1991 · 2020-10-20T11:39:14Z

when i use ：
val features = loadmodel.getRawFeatures().map(_.name)
the order also changed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520

Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

tovbinm commented Oct 19, 2020

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

tovbinm commented Oct 19, 2020

hjfrank1991 commented Oct 20, 2020

tovbinm commented Oct 20, 2020

hjfrank1991 commented Oct 20, 2020

hjfrank1991 commented Oct 20, 2020

Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520

Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) #520

Comments

hjfrank1991 commented Oct 19, 2020 • edited by tovbinm

hjfrank1991 commented Oct 19, 2020 • edited by tovbinm

tovbinm commented Oct 19, 2020

hjfrank1991 commented Oct 19, 2020 • edited by tovbinm

hjfrank1991 commented Oct 19, 2020 • edited by tovbinm

tovbinm commented Oct 19, 2020

hjfrank1991 commented Oct 20, 2020

tovbinm commented Oct 20, 2020

hjfrank1991 commented Oct 20, 2020

hjfrank1991 commented Oct 20, 2020

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm

hjfrank1991 commented Oct 19, 2020 •

edited by tovbinm