首页 \ 问答 \ 在AWS EMR上对Hadoop输出使用LZO时,是否会对文件(存储在S3中)进行索引以便将来自动拆分?(When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?)

在AWS EMR上对Hadoop输出使用LZO时,是否会对文件(存储在S3中)进行索引以便将来自动拆分?(When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?)

我想对存储在S3上的Elastic Map Reduce作业输出使用LZO压缩,但不清楚文件是否自动编入索引,以便将来对该数据运行的作业会将文件拆分为多个任务。

例如,如果我的输出是一堆TSV数据行,在1GB LZO文件中,未来的地图作业将只创建1个任务,或类似(1GB / blockSize)任务(即文件未压缩时的行为) ,或者如果目录中有LZO索引文件)?

编辑:如果没有自动完成,建议将输出设为LZO索引? 将文件上传到S3 之前进行索引编制?


I want to use LZO compression on my Elastic Map Reduce job's output that is being stored on S3, but it is not clear if the files are automatically indexed so that future jobs run on this data will split the files into multiple tasks.

For example, if my output is a bunch of lines of TSV data, in a 1GB LZO file, will a future map job only create 1 task, or something like (1GB/blockSize) tasks (i.e. the behavior of when files were not compressed, or if there was a LZO index file in the directory)?

Edit: If this is not done automatically, what is recommended for getting my output to be LZO-indexed? Do the indexing before uploading the file to S3?


原文:https://stackoverflow.com/questions/13019996
更新时间:2022-03-26 22:03

最满意答案

$link应该是一个简单的HTML元素对象 ,您可以使用$link->href访问属性,将文本内容作为$link->plaintext 。 请参见http://simplehtmldom.sourceforge.net/manual.htm


$link should be a Simple HTML Element object, of which you can access attributes using $link->href and the text contents as $link->plaintext. See http://simplehtmldom.sourceforge.net/manual.htm.

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。