bits on data


Originally Posted on

TL;DR: The Hive connector is what you use in Trino for reading data from object storage that is organized according to the rules laid out by Hive, without using the Hive runtime code.

One of the most confusing aspects when starting Trino is the Hive connector. Typically, you seek out the use of Trino when you experience an intensely slow query turnaround from your existing Hadoop, Spark, or Hive infrastructure. In fact, the genesis of Trino came about due to these slow Hive query conditions at Facebook back in 2012.

So when you learn that Trino has a Hive connector, it can be rather confusing since you moved to Trino to circumvent the slowness of your current Hive cluster. Another common source of confusion is when you want to query your data from your cloud object storage, such as AWS S3, MinIO, and Google Cloud Storage. This too uses the Hive connector. If that confuses you, don’t worry, you are not alone. This blog aims to explain this commonly confusing nomenclature.