Presto和hive分区发现(Presto and hive partition discovery)
我正在使用presto主要与蜂巢连接器连接到蜂巢Metastore。
我的所有表都是指向存储在S3中的数据的外部表。
我的主要问题是没有办法(至少我知道)在Presto中进行分区发现,所以在我开始在presto中查询表之前我需要切换到hive并运行
msck repair table mytable
在Presto有更合理的方式吗?
I'm using presto mainly with hive connector to connect to hive metastore.
All of my tables are external tables pointing to data stored in S3.
My main issue with this is that there is no way (at least on I'm aware of ) to do partition discovery in Presto ,so before I start query a table in presto I need to switch to hive and run
msck repair table mytable
is there more reasonable way to do it in Presto?
原文:https://stackoverflow.com/questions/34478297
最满意答案
看看df的结构(
str
)你可以看到它自己作为data.frame。> str(df) 'data.frame': 10 obs. of 2 variables: $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2 $ boo:'data.frame': 10 obs. of 2 variables: ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1 ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2
因此你可以通过
df$boo$X1
和df$boo$X1
访问它们如果你想附加boo列,你可以使用
cbind
,如下所示:df <- data.frame('foo' = rep(c('ab','ac'), each = 5)) df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE))) names(df) <- c("foo", "boo_1", "boo_2")
给你的
foo boo_1 boo_2 1 ab a b 2 ab a b 3 ab a b 4 ab a b 5 ab a b 6 ac a c 7 ac a c 8 ac a c 9 ac a c 10 ac a c
Having a look at the structure (
str
) of df you can see that boo itsself as a data.frame.> str(df) 'data.frame': 10 obs. of 2 variables: $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2 $ boo:'data.frame': 10 obs. of 2 variables: ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1 ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2
So you can access them by
df$boo$X1
anddf$boo$X1
if you want to append the boo columns you can use
cbind
as follows:df <- data.frame('foo' = rep(c('ab','ac'), each = 5)) df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE))) names(df) <- c("foo", "boo_1", "boo_2")
which gives you
foo boo_1 boo_2 1 ab a b 2 ab a b 3 ab a b 4 ab a b 5 ab a b 6 ac a c 7 ac a c 8 ac a c 9 ac a c 10 ac a c
相关问答
更多-
从拆分列访问(Accessing from a split column)[2023-09-23]
看看df的结构( str )你可以看到它自己作为data.frame。 > str(df) 'data.frame': 10 obs. of 2 variables: $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2 $ boo:'data.frame': 10 obs. of 2 variables: ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1 ..$ X2: ... -
使用MySQL拆分列(Split Column with MySQL)[2023-05-29]
如果您不确定每条记录所需的电话号码的数量,您可能需要一个电话号码表。 您可能更容易运行php脚本来更新数据库(我假设您没有将您的电话号码清理为特定格式并且您使用INNODB): 创建电话号码表: CREATE TABLE `user_phone` ( `userid` int(10) unsigned NOT NULL, `phone` char(15) NOT NULL, PRIMARY KEY (`userid`,`phone`), CONSTRAINT `fk_user_phone_ ... -
拆分列(Splitting a column)[2021-03-08]
试试这个正则表达式, df[['gender','phone_number','email']]=df['Contact'].str.\ extract('\(([A-Z])\)\s?(\d{3}-\d{3}-\d{4})?\s?(.*)', expand = False) df.drop('Contact', axis = 1, inplace = True) EmployeeID FirstName LastName MiddleName gender phone_numbe ... -
拆分列并生成数据帧(Split column and generate dataframe)[2023-12-19]
假设分隔符是分号: df.transpose <- t(as.data.frame(strsplit(df$col2, ';'))) Assuming the delimiter is a semicolon: df.transpose <- t(as.data.frame(strsplit(df$col2, ';'))) -
Oracle中的拆分列(Split column in Oracle)[2022-04-17]
如果没有regexp,你需要为你需要的每个子字符串回复相同的逻辑,每个timi根据该子字符串的“终止符”的位置选择初始位置和leght。 /* input data */ with yourTable(column1) as ( select '/opt/log/data/abcd.efghi.jklmn.aaa.txt' from dual union all select '/opt/log/data/abbbcd.efccghi.jkdsdflmn.abab.t ... -
for /f "tokens=1*delims=/" %%a in (inputfile.txt) do >>outputfile.txt echo %%a %%b move /y outputfile.txt inputfile.txt 我建议你离开第二行并检查开始,以防万一你的处理不是你想要的,原始文件不会被覆盖。 如果您愿意,可以用逗号替换%%a和%%b之间的空格。 for /f "tokens=1*delims=/" %%a in (inputfile.txt) do >>outputfile.t ...
-
如何用熊猫拆分列?(How to split a column by pandas?)[2022-02-22]
使用str.split : df1 = df['Col_A'].str.split('-', expand=True) df1.columns = ['Col_A1', 'Col_A2'] print (df1) Col_A1 Col_A2 0 18K 22K 1 6K 9K 2 10K 16K 3 15K 25K 4 5K 7K 如果要将列添加到原始df : df[['Col_A1', 'Col_A2']] = df['Col_A' ... -
Pandas拆分列名(Pandas split column name)[2022-11-22]
这是另一种方式。 它假设低/高组分别以Low和High结束,因此我们可以使用.str.endswith()来识别哪些行是低/高。 这是样本数据 df = pd.DataFrame('group0Low group0High group1Low group1High routeLow routeHigh landmarkLow landmarkHigh'.split(), columns=['group_level']) df group_level 0 group0Low 1 gro ... -
SQL拆分列(SQL split column)[2023-01-23]
只是用case : select name_student, name_advisor, (case when money > 0 then money end) as money_positive, (case when money < 0 then money end) as money_negative from students s inner join advisors a on s.id_advisor = a.id_advisor; 笔记: 子 ... -
按拆分列内容排序(Order By Split Column Content)[2020-11-02]
我试过这个解决方案Order By Split Column但是当数字长度不相同时不起作用 您可以使它们包含相同数量的“。” 或代币。 例如,如果您知道最多可以有4个点(例如1.1.1.1.1),那么您可以运行此脚本来连接剩余的“.0”标记: create table mytable(id varchar); insert into mytable(id) values ('1'), ('1.1'), ('1.2'), ('1.2.1'), ('1.2.2'), ('1.19.1.1'), ('1.2. ...