SFaker

SFaker is one data generator. It implemented with Spark DataSourceV2. SFaker can generate rows according to specified schemas.

Features

Feature
Support Batch	✅
Support Stream	TBD
Support DataFrameReader API	✅
Support Spark SQL Create Statement	✅
Support Unsafe Row	✅
Support Codegen	✅
Support Limit Push Down	✅
Support Columns Pruning	✅

Types

Support spark sql types, more details about types click here.

Spark Type
Byte	✅
Short	✅
Integer	✅
Long	✅
Float	✅
Double	✅
Decimal	TBD
String	✅
Varchar	TBD
Char	TBD
Binary	TBD
Boolean	✅
Date	TBD
Timestamp	TBD
TimestampNTZ	TBD
YearMonthInterval	TBD
DayTimeInterval	TBD
Array	✅
Map	✅
Struct	✅

Config

Conf	Type	Default	Description
`spark.sql.fake.source.unsafe.row.enable`	Boolean	false	If `true`, all row generated will been stored in `UnsafeRow`.
`spark.sql.fake.source.unsafe.codegen.enable`	Boolean	false	If `true`, the row-generated process, which produce rows according to schema, will been executed in JIT mode.
`spark.sql.fake.source.partitions`	Integer	1	Number of source partitions.
`spark.sql.fake.source.rowsTotalSize`	Integer	8	Number of rows generated according to schema.

Use Cases

DataFrameReader API

val schema = new StructType()
  .add("id", DataTypes.IntegerType)
  .add("sex", DataTypes.BooleanType)
  .add("roles", DataTypes.createArrayType(DataTypes.StringType));

val df = spark.read
  .format("FakeSource")
  .schema(schema)
  .option(FakeSourceProps.CONF_ROWS_TOTAL_SIZE, 100)
  .option(FakeSourceProps.CONF_PARTITIONS, 1)
  .option(FakeSourceProps.CONF_UNSAFE_ROW_ENABLE, true)
  .option(FakeSourceProps.CONF_UNSAFE_CODEGEN_ENABLE, true)
  .load();

Spark SQL Create Statement

val spark = SparkSession
      .builder()
      .master("local[*]")
      .appName("Case0")
      .config(
        "spark.sql.catalog.spark_catalog",
        classOf[FakeSourceCatalog].getName
      )
      .getOrCreate();
val df = spark.sql("""
          |create table fake (
          | id int,
          | sex boolean
          |)
          |using FakeSource
          |tblproperties (
          |spark.sql.fake.source.rowsTotalSize = 10000000,
          |spark.sql.fake.source.partitions = 1,
          |spark.sql.fake.source.unsafe.row.enable = true,
          |spark.sql.fake.source.unsafe.codegen.enable = true
          |)
          |""".stripMargin)
spark.sql("select id from fake limit 10").explain(true);

Star History

License

Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
docs/logo		docs/logo
licenses		licenses
sfaker-core		sfaker-core
sfaker-examples		sfaker-examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs/logo

docs/logo

licenses

licenses

sfaker-core

sfaker-core

sfaker-examples

sfaker-examples

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pom.xml

pom.xml

Repository files navigation

SFaker

Features

Types

Config

Use Cases

DataFrameReader API

Spark SQL Create Statement

Star History

License

About

Releases

Packages

Languages

License

IcarusDB/SFaker

Folders and files

Latest commit

History

Repository files navigation

SFaker

Features

Types

Config

Use Cases

DataFrameReader API

Spark SQL Create Statement

Star History

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages