tooptoop4 commented Nov 21, 2018. Since parquet contains column metadata I am thinking no need to setup hive , can have presto query direct to the parquet on s3. Please improve parquet connector to not require hive #12955.
Sep 25, 2020 · Ahana Cloud for Presto can tap into AWS data sources such as Amazon S3, Amazon Elasticsearch, and Amazon Relational Database System (Amazon RDS) for MySQL, PostgreSQL, and others.

Nov 10, 2020 · Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes. Developed by Facebook to run queries against their 300PB-data warehouse back then in 2008, Presto has the power to handle perabyte-sized warehouses without any problem.
Apr 19, 2020 · At the time of writing, we have 6Gb of Parquet files stored in S3 using this system. That’s ~130 million events. The monthly costs to run this are wonderfully low: S3 storage – $0.80; Athena – $0.30 (varies based on your query volume) Glue – $0.50; Kinesis Firehose – $0.50; gotchas

But one of the interesting things about how they use it is they query their data living in Amazon's S3 buckets. These are essentially network drives and the data there are stored in flat files. So as I mentioned, you can connect Presto to almost any type of data or data store. That includes flat files living on a network share.
Oct 20, 2020 · Typically, you look for an S3 connector, a GCS connector or a MinIO connector. All you need is the Hive connector and the HMS to manage the metadata of the objects in your storage. The Hive Metastore Service. The HMS is the only Hive process used in the entire Presto ecosystem when using the Hive connector.
Nov 25, 2020 · Presto is compatible with the ANSI SQL standard and can connect to multiple RDBMS and data warehouse data sources, using the same SQL syntax and SQL functions on these data sources. Bring SQL execution capabilities to storage systems that do not have SQL execution capabilities.
tpch catalog using the Presto TPCH Connector. This is a read-only catalog that defines standard TPCH schema memory catalog using the Presto Memory Connector. This catalog can be used for creating schemas and tables and does not require any storage, as everything is stored fully in-memory.
Solving query optimization in Presto By combining machine learning and adaptive query execution, query optimization in Presto could become smarter and more efficient over repeated use.
Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (such as HDFS, S3, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.
Jul 31, 2020 · There are two primary ways to Presto clusters get access to data stored in S3: IAM role via the instance Presto servers are running on ; Access Key / Secret Key provided via the Hive connector properties file; IAM role – recommended approach. If using IAM role, Presto needs to be configured using . hive.s3.use-instance-credentials=true

Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka...
Mainly two things - (1) Separation of compute and storage , and (2) Query Federation. 1. Separation of compute and storage means you can decide how much you want to spend on compute (servers on EC2) based on budget and query SLAs (desired response...

