-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Run Presto on SeaweedFS
chrislu edited this page Nov 13, 2023
·
32 revisions
The installation steps are divided into 2 steps:
- Copy the seaweedfs-hadoop2-client-3.59.jar to hive lib directory,for example:
cp seaweedfs-hadoop2-client-3.59.jar /opt/hadoop/share/hadoop/common/lib/
cp seaweedfs-hadoop2-client-3.59.jar /opt/hive-metastore/lib/
- Modify core-site.xml
modify core-site.xml to support SeaweedFS, 30888 is the filer port
<configuration>
<property>
<name>fs.defaultFS</name>
<value>seaweedfs://10.0.100.51:30888</value>
</property>
<property>
<name>fs.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedAbstractFileSystem</value>
</property>
<property>
<name>fs.seaweed.buffer.size</name>
<value>4194304</value>
</property>
</configuration>
- Modify hive-site.xml
modify hive-site.xml to support SeaweedFS, need to manually create the /presto/warehouse directory in Filer
metastore.thrift.port is the access port exposed by the Hive Metadata service itself
<property>
<name>metastore.warehouse.dir</name>
<value>seaweedfs://10.0.100.51:30888/presto/warehouse</value>
</property>
<property>
<name>metastore.thrift.port</name>
<value>9850</value>
</property>
Follow instructions for installation of Presto:
- Copy the seaweedfs-hadoop2-client-3.59.jar to Presto directory,for example:
cp seaweedfs-hadoop2-client-3.59.jar /opt/presto-server-347/plugin/hive-hadoop2/
- Modify core-site.xml
modify /opt/presto-server-347/etc/catalog/core-site.xml to support SeaweedFS, 30888 is the filer port
<configuration>
<property>
<name>fs.defaultFS</name>
<value>seaweedfs://10.0.100.51:30888</value>
</property>
<property>
<name>fs.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedAbstractFileSystem</value>
</property>
<property>
<name>fs.seaweed.buffer.size</name>
<value>4194304</value>
</property>
</configuration>
- Modify hive.properties
hive.metastore.uri is the service address of the previously deployed Hive Metastore
hive.config.resources points to the core-site.xml above
connector.name=hive-hadoop2
hive.metastore.uri=thrift://10.0.100.51:9850
hive.allow-drop-table=true
hive.max-partitions-per-scan=1000000
hive.compression-codec=NONE
hive.config.resources=/opt/presto-server-347/etc/catalog/core-site.xml
- Modify config.properties
The default port of presto is 8080
If you want to modify the default port of the Presto service, you can modify /opt/presto-server-347/etc/config.properties
Need to modify the ports of http-server.http.port and discovery.uri
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=200GB
query.max-memory-per-node=8GB
query.max-total-memory-per-node=10GB
query.max-stage-count=200
task.writer-count=4
discovery-server.enabled=true
discovery.uri=http://10.0.100.51:8080
- Connect to Presto and create a table boshen
--server is the ip and port of the Presto service
[root@cluster9 ~]# ./presto --server 10.0.100.51:8080--catalog hive --schema default
presto:default> create table boshen(name varchar);
CREATE TABLE
presto:default>
- Query whether the boshen directory has been generated in 10.0.10.51:30888/presto/warehouse
[root@cluster9 ~]# curl -H "Accept: application/json" http://10.0.100.51:30888/presto/warehouse/?pretty=y
{
"Path": "presto/warehouse",
"Entries": [
{
"FullPath": "/presto/warehouse/boshen",
"Mtime": "2020-12-02T20:29:08+08:00",
"Crtime": "2020-12-02T20:29:08+08:00",
"Mode": 2147484159,
"Uid": 0,
"Gid": 0,
"Mime": "",
"Replication": "",
"Collection": "",
"TtlSec": 0,
"UserName": "root",
"GroupNames": [
"root"
],
"SymlinkTarget": "",
"Md5": null,
"FileSize": 0,
"Extended": null,
"HardLinkId": null,
"HardLinkCounter": 0
}
],
"Limit": 100,
"LastFileName": "boshen",
"ShouldDisplayLoadMore": false
}
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- Filer Cassandra Setup
- Filer Redis Setup
- Filer YugabyteDB Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up