首页 \ 问答 \ Presto和hive分区发现(Presto and hive partition discovery)

Presto和hive分区发现(Presto and hive partition discovery)

我正在使用presto主要与蜂巢连接器连接到蜂巢Metastore。

我的所有表都是指向存储在S3中的数据的外部表。

我的主要问题是没有办法(至少我知道)在Presto中进行分区发现,所以在我开始在presto中查询表之前我需要切换到hive并运行msck repair table mytable

在Presto有更合理的方式吗?


I'm using presto mainly with hive connector to connect to hive metastore.

All of my tables are external tables pointing to data stored in S3.

My main issue with this is that there is no way (at least on I'm aware of ) to do partition discovery in Presto ,so before I start query a table in presto I need to switch to hive and run msck repair table mytable

is there more reasonable way to do it in Presto?


原文:https://stackoverflow.com/questions/34478297
更新时间:2022-02-22 22:02

最满意答案

看看df的结构( str )你可以看到它自己作为data.frame。

> str(df)
'data.frame':   10 obs. of  2 variables:
 $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2
 $ boo:'data.frame':    10 obs. of  2 variables:
  ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1
  ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2

因此你可以通过df$boo$X1df$boo$X1访问它们

如果你想附加boo列,你可以使用cbind ,如下所示:

df <- data.frame('foo' = rep(c('ab','ac'), each = 5))
df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE)))
names(df) <- c("foo", "boo_1", "boo_2")

给你的

   foo boo_1 boo_2
1   ab     a     b
2   ab     a     b
3   ab     a     b
4   ab     a     b
5   ab     a     b
6   ac     a     c
7   ac     a     c
8   ac     a     c
9   ac     a     c
10  ac     a     c

Having a look at the structure (str) of df you can see that boo itsself as a data.frame.

> str(df)
'data.frame':   10 obs. of  2 variables:
 $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2
 $ boo:'data.frame':    10 obs. of  2 variables:
  ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1
  ..$ X2: Factor w/ 2 levels "b","c": 1 1 1 1 1 2 2 2 2 2

So you can access them by df$boo$X1 and df$boo$X1

if you want to append the boo columns you can use cbind as follows:

df <- data.frame('foo' = rep(c('ab','ac'), each = 5))
df <- cbind(df, do.call('rbind', strsplit(as.character(df$foo),'',fixed=FALSE)))
names(df) <- c("foo", "boo_1", "boo_2")

which gives you

   foo boo_1 boo_2
1   ab     a     b
2   ab     a     b
3   ab     a     b
4   ab     a     b
5   ab     a     b
6   ac     a     c
7   ac     a     c
8   ac     a     c
9   ac     a     c
10  ac     a     c

相关问答

更多
  • 看看df的结构( str )你可以看到它自己作为data.frame。 > str(df) 'data.frame': 10 obs. of 2 variables: $ foo: Factor w/ 2 levels "ab","ac": 1 1 1 1 1 2 2 2 2 2 $ boo:'data.frame': 10 obs. of 2 variables: ..$ X1: Factor w/ 1 level "a": 1 1 1 1 1 1 1 1 1 1 ..$ X2: ...
  • 如果您不确定每条记录所需的电话号码的数量,您可能需要一个电话号码表。 您可能更容易运行php脚本来更新数据库(我假设您没有将您的电话号码清理为特定格式并且您使用INNODB): 创建电话号码表: CREATE TABLE `user_phone` ( `userid` int(10) unsigned NOT NULL, `phone` char(15) NOT NULL, PRIMARY KEY (`userid`,`phone`), CONSTRAINT `fk_user_phone_ ...
  • 试试这个正则表达式, df[['gender','phone_number','email']]=df['Contact'].str.\ extract('\(([A-Z])\)\s?(\d{3}-\d{3}-\d{4})?\s?(.*)', expand = False) df.drop('Contact', axis = 1, inplace = True) EmployeeID FirstName LastName MiddleName gender phone_numbe ...
  • 假设分隔符是分号: df.transpose <- t(as.data.frame(strsplit(df$col2, ';'))) Assuming the delimiter is a semicolon: df.transpose <- t(as.data.frame(strsplit(df$col2, ';')))
  • 如果没有regexp,你需要为你需要的每个子字符串回复相同的逻辑,每个timi根据该子字符串的“终止符”的位置选择初始位置和leght。 /* input data */ with yourTable(column1) as ( select '/opt/log/data/abcd.efghi.jklmn.aaa.txt' from dual union all select '/opt/log/data/abbbcd.efccghi.jkdsdflmn.abab.t ...
  • for /f "tokens=1*delims=/" %%a in (inputfile.txt) do >>outputfile.txt echo %%a %%b move /y outputfile.txt inputfile.txt 我建议你离开第二行并检查开始,以防万一你的处理不是你想要的,原始文件不会被覆盖。 如果您愿意,可以用逗号替换%%a和%%b之间的空格。 for /f "tokens=1*delims=/" %%a in (inputfile.txt) do >>outputfile.t ...
  • 使用str.split : df1 = df['Col_A'].str.split('-', expand=True) df1.columns = ['Col_A1', 'Col_A2'] print (df1) Col_A1 Col_A2 0 18K 22K 1 6K 9K 2 10K 16K 3 15K 25K 4 5K 7K 如果要将列添加到原始df : df[['Col_A1', 'Col_A2']] = df['Col_A' ...
  • 这是另一种方式。 它假设低/高组分别以Low和High结束,因此我们可以使用.str.endswith()来识别哪些行是低/高。 这是样本数据 df = pd.DataFrame('group0Low group0High group1Low group1High routeLow routeHigh landmarkLow landmarkHigh'.split(), columns=['group_level']) df group_level 0 group0Low 1 gro ...
  • 只是用case : select name_student, name_advisor, (case when money > 0 then money end) as money_positive, (case when money < 0 then money end) as money_negative from students s inner join advisors a on s.id_advisor = a.id_advisor; 笔记: 子 ...
  • 我试过这个解决方案Order By Split Column但是当数字长度不相同时不起作用 您可以使它们包含相同数量的“。” 或代币。 例如,如果您知道最多可以有4个点(例如1.1.1.1.1),那么您可以运行此脚本来连接剩余的“.0”标记: create table mytable(id varchar); insert into mytable(id) values ('1'), ('1.1'), ('1.2'), ('1.2.1'), ('1.2.2'), ('1.19.1.1'), ('1.2. ...

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)