Python Reader supports PyTorch, PyArrow, Pandas, Ray, and distributed execution;
Support Spark Gluten Vectorized Engine;
Spark SQL supports Compaction, Rollback and other Call Procedures;
Flink CDC’s entire database synchronization supports MySQL, PostgreSQL, PolarDB, and Oracle;
Support streaming and batch export to MySQL, PostgreSQL, PolarDB, and Apache Doris;
Optimized NativeIO performance.

更新内容

Python Reader 支持 PyTorch、PyArrow、Pandas、Ray，支持分布式执行；
支持 Spark Gluten Vectorized Engine；
Spark SQL 支持 Compaction、Rollback 等 Call Procedures；
Flink CDC 整库同步支持 MySQL、PostgreSQL、PolarDB、Oracle；
支持流式、批式出湖至 MySQL、PostgreSQL、PolarDB、Apache Doris；
优化 NativeIO 性能.

What's Changed

[Spark]rename MetaVersion at lakesoul-spark as SparkMetaVersion by @Ceng23333 in #353
[Metadata]Replace table_info.table_schema with arrow kind schema (Backward Compatibility) by @Ceng23333 in #354
[Python][Dataset] Add Ray reading support by @codingfun2022 in #355
[Spark]optimize incremental read and fix compact operation cause column disorder bug by @F-PHantam in #352
[Rust] Create Rust CI by @Ceng23333 in #356
[Rust][Metadata]Create Rust MetadataClient & add CI test cases by @Ceng23333 in #357
[Rust][NativeIO]Use stable rustc for lakesoul-io feature default by @Ceng23333 in #358
[Python][Rust][Metadata] Update python metadata interface && Full arrow types test by @Ceng23333 in #359
[Spark] Spark Sql Support 'drop partition' Operation by @F-PHantam in #360
[Python]python deserialized schema from java by @Ceng23333 in #361
[Python] Fix wheel building; update version to 1.0.0b1 by @codingfun2022 in #362
[Rust][Metadata]Asynchronized rust metadata method by @Ceng23333 in #365
Add some rust test cases by @zhaishuangszszs in #364
[Datafusion]Implement LakeSoul Catalog by @Ceng23333 in #366
[Rust] add upsert test cases by @zhaishuangszszs in #367
[Flink] update fury version to 0.4 by @xuchen-plus in #368
refine upsert test by @zhaishuangszszs in #369
[Spark] support call sql syntax by @moresun in #370
[Rust]DataFusion version upgraded to 33.0.0 by @Ceng23333 in #372
[Spark] Support Gluten Vectorized Engine by @xuchen-plus in #374
[Flink] Support oracle cdc source by @ChenYunHey in #375
[NativeIO] Use rust block api in file read by @xuchen-plus in #377
[Flink] Add export to external dbs for LakeSoul's tables by @ChenYunHey in #376
[Rust] Add LakeSoulHashTable Sink for DataFusion by @Ceng23333 in #382
[NativeIO] Enable parquet rowgroup prefetch. Support s3 host style access by @xuchen-plus in #384
[Rust]fix hash value to spark_murmur3 by @Ceng23333 in #385
[BugFix]Fails when create table with nullable hash colmun by @Ceng23333 in #387
[Flink] Add Jdbc cdc sources and sinks by @ChenYunHey in #381
[Python] fix python meta config parse logic by @xuchen-plus in #388
[Project/Doc] Bump version to 2.5.0 and update docs by @xuchen-plus in #389
Bump postcss from 8.4.23 to 8.4.33 in /website by @dependabot in #396
Bump @babel/traverse from 7.21.5 to 7.23.7 in /website by @dependabot in #393
Bump follow-redirects from 1.15.2 to 1.15.4 in /website by @dependabot in #399
Bump org.apache.avro:avro from 1.11.0 to 1.11.3 in /lakesoul-spark by @dependabot in #394
Bump com.google.guava:guava from 30.1.1-jre to 32.0.0-jre in /lakesoul-presto by @dependabot in #395
[Rust] Update arrow rs dependencies by @xuchen-plus in #400

New Contributors

@zhaishuangszszs made their first contribution in #364

Full Changelog: v2.4.1...v2.5.0

Contributors

xuchen-plus, Ceng23333, and 6 other contributors

Assets 8

0 Join discussion

12 Oct 07:14

xuchen-plus

v2.4.1

349c510

Release v2.4.1

What's Changed

[Flink] Flink can configure global warehouse dir by @F-PHantam in #342
[NativeIO] Implement DataFusion TableProvider by @Ceng23333 in #341
[Spark]Spark parquet filter pushdown exactly by @Ceng23333 in #343
[Spark]Spark parquet filter pushdown evaluation + bugfix by @Ceng23333 in #344
[Meta] fix meta field compatibility in partition info table by @xuchen-plus in #345
[Common] Cleanup redundant DataOperation by @Ceng23333 in #346
[Docs] add kyuubi with lakesoul setup doc. by @Asakiny in #348
[Native-Metadata] Adaptive jnr buffer size by @Ceng23333 in #347
[NativeIO][Bug] LakeSoulParquetProvider projection bugfix by @Ceng23333 in #349
[NativeIO] Enable parquet prefetch & use stable sort by @xuchen-plus in #350

Full Changelog: v2.4.0...v2.4.1

Contributors

xuchen-plus, Ceng23333, and 2 other contributors

Assets 5

0 Join discussion

21 Sep 09:16

xuchen-plus

v2.4.0

9864e3e

LakeSoul Release v2.4.0 and Python 1.0 Beta

What's New In This Release

RBAC support for all query engines. doc
Auto cleaning of old compaction data and partition TTL. doc
Upgrade Flink version to 1.17 and support row level update/delete in batch sql.
Optimize whole database Flink cdc sync throughput by 80%: #307
Presto Reader; doc
Python reader and integration with PyTorch and HuggingFace. doc

本次更新内容

支持 RBAC 角色权限控制，对所有引擎、所有语言API均有效；文档
自动清理旧的 compaction 数据，支持分区级生命周期（TTL）；文档
升级 Flink 版本到 1.17，并支持批模式下行级别更新和删除；
优化整库同步 Flink 作业，吞吐提升 80%： #307 ；
支持 Presto 读取；文档
支持原生 Python 读取，提供 PyTorch、HuggingFace 的集成。文档

What's Changed

[NativeIO] Upgrade datafusion to 27 by @xuchen-plus in #282
[Flink] implement filter pushdown and fix partition pushdown in flink by @xuchen-plus in #287
Upgrade Flink to 1.17 by @xuchen-plus in #288
[Python][NativeIO] Add C interface definition by @xuchen-plus in #291
[NativeIO] update arrow version by @xuchen-plus in #290
Add Built-in RBAC support by @clouddea in #292
fix apache license by @clouddea in #293
[Native-Metadata] Rust implementation of DAO layer by @Ceng23333 in #294
[Flink] fix jackson-core package in flink by @xuchen-plus in #297
[Docs] update docs by @xuchen-plus in #298
[Flink] upgrade flink cdc connector to 2.4 by @xuchen-plus in #303
clean old compaction data and redundant data by @ChenYunHey in #304
[Python][Native-Metadata] Python interface of lakesoul metadata by @Ceng23333 in #305
[Python] C callback with data by @xuchen-plus in #306
[Python][Dataset] PyArrow and PyTorch dataset api for LakeSoul by @codingfun2022 in #308
[Flink] rollback flink cdc to 2.3.0 and supplement tables check in benchmark by @F-PHantam in #309
[Flink] Optimize CDC sink serde with Fury by @xuchen-plus in #307
[NativeIO] add hdfs feature in lakesoul-io-c by @xuchen-plus in #311
[Python] exclude partition column at get_arrow_schema_by_table_name by @Ceng23333 in #312
[Native-Metadata] Retry when native metadata client fail by @Ceng23333 in #313
[Flink] cdc supplement data delay check mechanism and fix logicallyDropColumn bug by @F-PHantam in #315
Presto Connector Support by @clouddea in #314
add scala in common to address build in idea intellij by @xuchen-plus in #316
[Flink] Ignore exception when hadoop env missing by @xuchen-plus in #317
[NativeIO] Merge native modules by @Ceng23333 in #318
bump version to 2.4.0 by @xuchen-plus in #319
[RBAC] Set hdfs dir owner by @xuchen-plus in #321
[BugFix]support query metadata with null string by @Ceng23333 in #324
[Spark] list namespace should return empty array by @xuchen-plus in #323
[Python][Dataset] Update Python dataset api for LakeSoul by @codingfun2022 in #325
[Python] Examples using Python API for AI model training by @Ceng23333 in #327
update docs and readme for release 2.4 by @xuchen-plus in #328
[Docs] Usage on auto table clean by @ChenYunHey in #326
[Docs] Add presto connector deployment docs by @xuchen-plus in #329
[Docs] Add docs for Python and PyTorch by @Ceng23333 in #330
[Docs] add workspace and rbac docs by @xuchen-plus in #331
[Bug] turn off native meta query and temporarily disable io prefetch by @F-PHantam in #333
[Bug]filter should not pushdown before merge on read by @Ceng23333 in #310
Support view、batch update、batch delete in flink by @moresun in #332
[Docs ] Refine flink sql and python docs by @xuchen-plus in #337

Full Changelog: https://github.com/lakesoul-io/LakeSoul/commits/v2.4.0

Contributors

xuchen-plus, Ceng23333, and 5 other contributors

Assets 8

0 Join discussion

22 Aug 02:40

xuchen-plus

v2.3.1

43e8c0c

LakeSoul Release v2.3.1

Fix jackson-core packaging for Flink package
Fix commons-lang class missing
Fix snapshot rollback/cleanup with local timezone

Assets 6

0 Join discussion

13 Jul 09:44

xuchen-plus

v2.3.0

aac4dfa

LakeSoul Release v2.3.0

v2.3.0 Release Notes

This is the first release after LakeSoul donated to Linux Foundation AI & Data. This release contains the following major new features:

Flink Connector for Flink SQL/Table API to read or write LakeSoul in both batch and streaming mode, with the supports of Flink Changelog Stream semantics and row-level upsert and delete. See docs Flink Connector.
Flink CDC Ingestion refactored to infer new tables and schema changes automatically from messages. This enables simpler CDC stream ingestion job development for any kinds of database or message queues.
Global automatic compaction service. See docs Auto Compaction Service.

更新日志

这是 LakeSoul 捐赠给 Linux Foundation AI & Data 后的第一个发布版本。该版本包含以下重要更新：

全面支持 Flink SQL/Table API. LakeSoul 支持 Flink 流、批读写。流式读写完整支持 Flink Changelog 语义，支持行级别流式增删改。参考文档
Flink CDC 整库同步重构，支持从消息中自动推断新表和 schema 变更。能够更简单的开发 CDC 入湖作业并支持消费任意数据库 CDC 流或消息队列流。
全局自动 Compaction 服务。参考文档：LakeSoul 全局自动压缩服务使用方法

What's Changed

[NativeIO] Native io misc improvements by @dmetasoul01 in #190
optimize filesForScan by @F-PHantam in #192
Add Definition Comments for com.dmetasoul.lakesoul.meta.entity by @YuChangHui in #193
Implement Delta Join Interfaces for LakeSoulTable by @YuChangHui in #184
[Flink] pack paranamer to flink release jar by @dmetasoul01 in #196
[NativeIO] use tcmalloc as global allocator by @xuchen-plus in #204
[NativeIO] fix memory leak in native reader by @xuchen-plus in #209
[Flink] avoid cast global parameter to ParameterTool by @xuchen-plus in #207
migrate arrow-rs and datafusion deps to new org by @xuchen-plus in #211
Implement Global Automatic Disaggregated Compaction Service by @F-PHantam in #212
Implement Flink ScanTableSource and LookupTableSource by @YuChangHui in #213
fix data type timestamp with zone by @lypnaruto in #215
[NativeIO]throw execption when LakeSoulArrowReader.hasNext by @Ceng23333 in #217
[NativeIO]add rust clippy workflow && fix clippy error/warn by @Ceng23333 in #219
add flink sql submitter(#199) by @Hades-888 in #221
Update readme by @xuchen-plus in #222
bump version to 2.3.0 by @xuchen-plus in #223
update github links by @xuchen-plus in #224
fix bug: requested file schema no change in stream task by @F-PHantam in #226
[Flink]LakeSoulCatalog::listTables: list tableName instead of tablePath by @Ceng23333 in #227
[Flink]fix parse error of LogicalTypeRoot::Date by @Ceng23333 in #228
[NativeIO]panic when target datatype and source datatype mismatch by @Ceng23333 in #214
[Flink]support flink decimal by @Ceng23333 in #232
update LakeSoulTableSource.getChangelogMode by @Ceng23333 in #231
[NativeIO]fix clippy warning by @Ceng23333 in #230
Fix hash bucket num by @xuchen-plus in #233
[Flink]add batch in flink sql submitter by @Hades-888 in #234
disable tcmalloc by @xuchen-plus in #235
[Project] add lakesoul project website code by @xuchen-plus in #237
update load flink sql from hdfs in yarn application by @Hades-888 in #238
[Flink]add Maven-test CI for lakesoul-flink by @lypnaruto in #239
Add cross build for native io by @xuchen-plus in #241
[Project] disable git lfs by @xuchen-plus in #243
fix bugs for same bucket readed by differnet stream tasks by @moresun in #245
[Project] Add pr checks and deployment actions by @xuchen-plus in #244
[Flink]fix FlinkDatatype::timestamp_ltz zone conversion && support FlinkDatatype::timestamp by @Ceng23333 in #246
Prepare meta in maven test by @xuchen-plus in #247
[Flink]Fix LookupSource FS configuration setting by @Ceng23333 in #248
LakeSoul mysql cdc convert Datatype::datetime to timestamp with timezone by @F-PHantam in #249
[Spark] Fix compatibility with spark 3.3.2 by @xuchen-plus in #251
add flink source and sink ci test by @F-PHantam in #252
[Flink] fix wrong logging config file in flink test by @xuchen-plus in #253
[Flink] Move partition column fill to native io by @xuchen-plus in #254
Fix datatype conversion from flink to spark by @Ceng23333 in #255
[Flink] Add source failover test cases by @xuchen-plus in #256
[Flink] LakeSoulSinkGlobalCommitter by @Ceng23333 in #257
add LAKESOUL_PARTITION_SPLITTER as constant by @Ceng23333 in #260
remove guava and commons-lang in common module by @xuchen-plus in #261
Modify mysqlcdc sort key generation way by @F-PHantam in #263
[Flink] Add sink failover test cases by @Ceng23333 in #259
[Flink] Fix flink reader npe by @xuchen-plus in #265
[Flink]complete test options of sink fail tests by @Ceng23333 in #266
Refine meta partition values by @xuchen-plus in #267
[Flink]Check schema migration at GlobalCommitter by @Ceng23333 in #269
Fix meta exception handling by @xuchen-plus in #270
Update website and readme for 2.3.0 release by @xuchen-plus in #271

Contributors

xuchen-plus, Ceng23333, and 6 other contributors

Assets 4

0 Join discussion

31 Mar 08:33

dmetasoul01

v2.2.0

c3b83c1

v2.2.0

LakeSoul Release v2.2.0

v2.2.0 Release Notes

Native IO is by default enabled for Flink CDC Sink and Spark SQL. Native IO uses arrow-rs and Datafusion with special IO optimizations based on arrow-rs' object store. Benchmarks show 3x IO throughput improvement over parquet-mr and Hadoop filesystem. Native IO supports both HDFS and S3 object storage (including S3 protocol compatible storages). Native IO supports all data types in Spark and Flink and has passed both TPC-H and CHBenchmark correctness tests.
Snapshot read and incremental read support on Spark. LakeSoul's incremental read on spark supports both batch mode and microbatch streaming mode.
Default supported Spark's version has been upgraded to Spark 3.3.

v2.2.0 发布日志

Native IO 在 Flink 和 Spark 上默认启用。Native IO 使用 arrow-rs 和 [Datafusion] (https://github.com/apache/arrow-datafusion) 实现，并在 arrow-rs object store 上做了专门的性能优化。在实际测试中比 parquet-mr+hadoop filesystem 快 3 倍以上。Native IO 可以支持 HDFS 和 S3 存储，以及与 S3 兼容的存储系统。Native IO 经过了详细的测试，能够支持 Flink、Spark 所有数据类型，并通过了 TPC-H 和 CHBenchmark 的正确性校验。
在 Spark 上支持了快照读和增量读功能。增量读功能可以支持 batch 模式和 micro batch streaming 模式。
默认的 Spark 版本更新到 3.3.

What's Changed

[Feature] Timestamp based snapshot read, rollback and cleanup by @dmetasoul01 in #104
[Flink] write timestamp to int64 instead of int96 in flink sink by @dmetasoul01 in #106
Only one partition and compaction to parquet scan by @F-PHantam in #109
Bump postgresql from 42.5.0 to 42.5.1 in /lakesoul-common by @dependabot in #111
Incremental query by @lypnaruto in #110
Add Benchmarks by @dmetasoul01 in #115
Flink serde optimization by @dmetasoul01 in #117
Develop/native io spark by @Ceng23333 in #118
Fix CI with Maven Test by @Ceng23333 in #121
Support Kafka multiple topics sync to LakeSoul by @F-PHantam in #122
solve dependency problem of confluent jar by @F-PHantam in #124
fix maven-test with native-io by @Ceng23333 in #125
[NativeIO] Native io parquet writer implementation by @dmetasoul01 in #128
[Spark] Streaming Read by @lypnaruto in #129
[Spark] Upgrade Spark version to 3.3 for main branch by @dmetasoul01 in #132
use Arrow Schema instead of HashMap for lakesoul_reader filter by @YuChangHui in #136
[NativeIO] Native writer c and jnr-ffi interface by @dmetasoul01 in #137
[NativeIO] fix native reader memory leak and double free by @dmetasoul01 in #138
[NativeIO] Native writer with primary keys sort support by @dmetasoul01 in #141
[NativeIO] Use ffi to pass arrow schema by @dmetasoul01 in #142
[NativeIO][Flink] Implement Flink native writer by @dmetasoul01 in #143
[NativeIO] fix callback object reference by @dmetasoul01 in #145
[NativeIO] upgrade arrow-rs to 31 and datafusion to 17 by @dmetasoul01 in #148
[NativeIO][Spark] Package native lib in lakesoul-spark jar by @dmetasoul01 in #149
[NativeIO] use maven profile for native packaging. default to local native build by @dmetasoul01 in #150
[NativeIO][Spark] Integrate nativeIO writer in lakesoul-spark by @F-PHantam in #151
[NativeIO] Implement Sorted Stream Merger by @Ceng23333 in #147
fix ParquetNativeFilterSuite by @Ceng23333 in #152
[NativeIO][Bug] Fix flink writer panic by @dmetasoul01 in #154
[NativeIO] optimize with smallvec for native merge by @dmetasoul01 in #155
[NativeIO][Flink] fix flink writer batch reset in java before write by @dmetasoul01 in #157
[NativeIO][Spark]Implement Interfaces for LakeSoulScanBuilder with Native-IO by @Ceng23333 in #156
[NativeIO] Native hdfs object store by @dmetasoul01 in #159
Add python api for snapshot and incremental query by @lypnaruto in #160
fix memory leak. add columnar supports by @dmetasoul01 in #164
[NativeIO] upgrade arrow version to 11 by @dmetasoul01 in #173
support date type for parmary key in flink cdc by @moresun in #174
[NativeIO][Flink] Fix Flink CDC Data Sort Bug and Handle DataType Change Issues from Mysql to LakeSoul by @F-PHantam in #175
fix native_io_timestamp_conversion for default case by @Ceng23333 in #176
Fix flink ci by @dmetasoul01 in #177
fix invalid LakeSoulSQLConf max_row_group_size in native_io_writer by @YuChangHui in #179
fix snapshot query default start time by @YuChangHui in #182
fix unexpectively close partitionColumnVectors on closeCurrentBatch by @Ceng23333 in #185
add support for non-pk steaming read in spark by @moresun in #188
upgrade jffi to 1.3.11 for centos 7 by @dmetasoul01 in #189
[Native-IO]add native_io support for empty schema and struct type by @Ceng23333 in #180

Full Changelog: https://github.com/meta-soul/LakeSoul/commits/v2.2.0

Contributors

Ceng23333, lypnaruto, and 5 other contributors

Assets 4

0 Join discussion

18 Oct 05:45

dmetasoul01

v2.1.1

60ff8ec

v2.1.1

What's Changed

This is a bug fix release for v2.1.0.

Fixed bugs:

Support geometry/point type in flink cdc by @Ceng23333 in #93
[BUG] fix pg password auth failed exception by @dmetasoul01 in #95
Add checkpoint_mode to flink job entry by @Ceng23333 in #96

Full Changelog: 2.1.0...v2.1.1

Contributors

Ceng23333 and dmetasoul01

Assets 4

0 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LakeSoul 2.5.0 Release Note

What's New

更新内容

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's New In This Release

本次更新内容

What's Changed

Contributors

v2.3.0 Release Notes

更新日志

What's Changed

Contributors

LakeSoul Release v2.2.0

v2.2.0 Release Notes

What's Changed

Contributors

What's Changed

Contributors

Releases: lakesoul-io/LakeSoul

v2.5.4

v2.5.3

v2.5.1

v2.5.0 & Python 1.0.0b1

LakeSoul 2.5.0 Release Note

What's New

更新内容

What's Changed

New Contributors

Contributors

Release v2.4.1

What's Changed

Contributors

LakeSoul Release v2.4.0 and Python 1.0 Beta

What's New In This Release

本次更新内容

What's Changed

Contributors

LakeSoul Release v2.3.1

LakeSoul Release v2.3.0

v2.3.0 Release Notes

更新日志

What's Changed

Contributors

v2.2.0

LakeSoul Release v2.2.0

v2.2.0 Release Notes

What's Changed

Contributors

v2.1.1

What's Changed

Contributors