首页 \ 问答 \ Presto和hive分区发现(Presto and hive partition discovery)

Presto和hive分区发现(Presto and hive partition discovery)

 我正在使用presto主要与蜂巢连接器连接到蜂巢Metastore。  
 我的所有表都是指向存储在S3中的数据的外部表。  
 我的主要问题是没有办法（至少我知道）在Presto中进行分区发现，所以在我开始在presto中查询表之前我需要切换到hive并运行msck repair table mytable  
 在Presto有更合理的方式吗？ 

I'm using presto mainly with hive connector to connect to hive metastore. 
All of my tables are external tables pointing to data stored in S3. 
My main issue with this is that there is no way (at least on I'm aware of ) to do partition discovery in Presto ,so before I start query a table in presto I need to switch to hive and run msck repair table mytable 
is there more reasonable way to do it in Presto?

原文：https://stackoverflow.com/questions/34478297

更新时间：2022-02-22 22:02

最满意答案

 看看df的结构（ str ）你可以看到它自己作为data.frame。  
> str(df)
'data.frame':   10 obs. of  2 variables:
 $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2
 $ boo:'data.frame':    10 obs. of  2 variables:
  ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1
  ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2
 
 因此你可以通过df$boo$X1和df$boo$X1访问它们  
 如果你想附加boo列，你可以使用cbind ，如下所示：  
df <- data.frame('foo' = rep(c('ab','ac'), each = 5))
df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE)))
names(df) <- c("foo", "boo_1", "boo_2")
 
 给你的  
   foo boo_1 boo_2
1   ab     a     b
2   ab     a     b
3   ab     a     b
4   ab     a     b
5   ab     a     b
6   ac     a     c
7   ac     a     c
8   ac     a     c
9   ac     a     c
10  ac     a     c

Having a look at the structure (str) of df you can see that boo itsself as a data.frame. 
> str(df)
'data.frame':   10 obs. of  2 variables:
 $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2
 $ boo:'data.frame':    10 obs. of  2 variables:
  ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1
  ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2
 
So you can access them by df$boo$X1 and df$boo$X1 
if you want to append the boo columns you can use cbind as follows: 
df <- data.frame('foo' = rep(c('ab','ac'), each = 5))
df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE)))
names(df) <- c("foo", "boo_1", "boo_2")
 
which gives you 
   foo boo_1 boo_2
1   ab     a     b
2   ab     a     b
3   ab     a     b
4   ab     a     b
5   ab     a     b
6   ac     a     c
7   ac     a     c
8   ac     a     c
9   ac     a     c
10  ac     a     c

Presto和hive分区发现(Presto and hive partition discovery)

最满意答案

相关问答

从拆分列访问(Accessing from a split column)[2023-09-23]

使用MySQL拆分列(Split Column with MySQL)[2023-05-29]

拆分列(Splitting a column)[2021-03-08]

拆分列并生成数据帧(Split column and generate dataframe)[2023-12-19]

Oracle中的拆分列(Split column in Oracle)[2022-04-17]

基于字符以CSV格式拆分列(Split column in CSV based on a character)[2022-03-23]

如何用熊猫拆分列？(How to split a column by pandas?)[2022-02-22]

Pandas拆分列名(Pandas split column name)[2022-11-22]

SQL拆分列(SQL split column)[2023-01-23]

按拆分列内容排序(Order By Split Column Content)[2020-11-02]

相关文章

最新问答