首页 \ 问答 \ Apache Sqoop和Spark(Apache Sqoop and Spark)

Apache Sqoop和Spark(Apache Sqoop and Spark)

 为了将大型SQL数据加载到Spark以进行转换和ML，以下哪个选项在性能方面更好。 
 选项1：使用Spark SQL JDBC连接器将SQLData直接加载到Spark。 
 选项2：使用Sqoop以csv格式将SQLData加载到HDFS，然后使用Spark从HDFS读取数据。  
 请建议以上哪种方法将大型SQL数据加载到Spark上。 

In order to load large SQL Data on to Spark for transformation & ML which of these below option is better in terms of performance.
 Option 1: Use Spark SQL JDBC connector to load directly SQLData on to Spark.
 Option 2: Use Sqoop to load SQLData on to HDFS in csv format and then Use Spark to read the data from HDFS.  
Please suggest which of the above in a good approach to load large SQL data on to Spark.

原文：https://stackoverflow.com/questions/33771776

更新时间：2023-10-05 09:10

最满意答案

sed 's/\([0-9]*\)\.\([0-9]*\) M/\1\200000/' file

sed 's/\([0-9]*\)\.\([0-9]*\) M/\1\200000/' file

相关问答

用`sed` vs`tr`替换空字节(Replacing null bytes with `sed` vs `tr`)[2022-05-19]

从tr（1）的手册页： SET被指定为字符串...解释的序列是： \ NNN字符，八进制值NNN（1到3个八进制数字）对于sed（1），手册页不是那么清楚，所以一些尝试可以显示一些东西： echo -n hi |sed 's/h/t/g' |hexdump -c (0000000 t i) 简单。然后： echo -n hi |sed 's/h//g' |hexdump -c (0000000 i) 空模式删除匹配。再简单。然后： echo -n hi |sed 's ...
将字节转换为兆字节(Converting bytes to megabytes)[2022-03-09]

传统上以兆字节表示您的第二个选项 - 1兆字节= 2 20字节。但实际上是不正确的，因为mega意味着1 000 000.有一个新的标准名称为2 20个字节，它是mebibyte（ http://en.wikipedia.org/wiki/Mebibyte ），它收集的人气。 Traditionally by megabyte we mean your second option -- 1 megabyte = 220 bytes. But it is not correct actually becau ...
将字节格式设置为千字节，兆字节，千兆字节(Format bytes to kilobytes, megabytes, gigabytes)[2022-03-19]

function formatBytes($bytes, $precision = 2) { $units = array('B', 'KB', 'MB', 'GB', 'TB'); $bytes = max($bytes, 0); $pow = floor(($bytes ? log($bytes) : 0) / log(1024)); $pow = min($pow, count($units) - 1); // Uncomment one of t ...
Sed / Awk搜索并替换/插入文件中的文本(Sed/Awk to search and replace/insert text in files)[2023-10-15]

如果我理解正确，你想：在前10行查找没有版权声明的文件，并将版权声明添加到这些文件。另外，你想要：在前10行中查找具有版权声明的文件，并更新他们对标准文本的注意。在我看来，这两项任务可以归结为一套：然后删除前10行中的任何现有版权声明在文件中插入新的版权声明。如果我们可以放心地假设你对你的问题发表评论的缩短版本的sampletext是有效的，并且应该插入到例如每个文件的第2行，那么以下应该达到第一组要求if你正在使用GNU sed： find . -type f -not -exec gr ...
如何在javascript中将千字节转换为兆字节[重复](how to convert kilobytes to megabytes in javascript [duplicate])[2023-08-14]

你的功能是正确的。它只接受字节。但是你想要做的是formatSizeUnits(4000) 。这是错误的，预期的输出是3.91 MB因为它除以1024而不是1000.正确的方法是调用like formatSizeUnits(4000*1024) // beacuse 4000 is in KB and convert into bytes 看到下面的片段，以得到正确的答案 function formatSizeUnits(bytes){ if (bytes>=107374 ...
sed / awk替换句子特定位置的单词(sed/awk replace word in specific position of sentence)[2023-02-01]

最简单：保留脚本，追加 UPDATE address SET city ='Detroit' 在vim中： :g/^INSERT INTO address/normal f;F'ci'Detroit 在sed中： sed "s/$VALUES (.*,$'.*\?'/\1'Detroit'/g" Easiest: Keep the script, append UPDATE address SET city ='Detroit' In vim: :g/^INSERT INTO address/n ...
sed / awk：用字节替换兆字节（插入零）(sed/awk: replace megabytes with bytes (insert zeros))[2023-04-21]

sed 's/$[0-9]*$\.$[0-9]*$ M/\1\200000/' file sed 's/$[0-9]*$\.$[0-9]*$ M/\1\200000/' file
优化将字节转换为兆字节然后按desc排序的数据库查询(Optimise database query that is converting bytes to megabytes and then ordering by desc)[2022-08-13]

我能想到几个选项：添加一个表示Mb大小的列（包含所有额外的存储并保持同步问题）。使用带有基于函数的索引的“计算列”： CREATE TABLE DS_DocumentStorage ( ... DS_FileSizeMB AS [DS_FileSize] / 1048576 ); CREATE INDEX ix_DS_FileSizeMB ON DS_DocumentStorage(DS_FileSizeMB); 注意：您应该测试执行计划，看 ...
如何替换linux输出中的列数据 - 可能正在使用awk sed等(How to replace a column data in linux output - may be using awk sed etc)[2022-06-10]

$ cat foo.input drwxrwxr-x 2 root root 5512 Aug 22 2013 bin lrwxrwxrwx 1 root root 7 Aug 22 2013 bin/addgroup -> busybox lrwxrwxrwx 1 root root 7 Aug 22 2013 bin/adduser -> busybox lrwxrwxrwx 1 ro ...
用零替换字符串中的空格(Replace whitespace in string with zeros)[2022-04-04]

使用awk你可以这样做： awk '{val=$3; for (i=4; i

Apache Sqoop 1.4.3 发布，Hadoop 数据迁移

[Hadoop] Sqoop安装过程详解

Apache Spark源码走读之8 -- Spark on Yarn

Apache Spark源码走读之10 -- 在YARN上运行SparkPi

使用Sqoop在HDFS和RDBMS之间导数据

Apache Spark源码走读之4 -- DStream实时流数据处理

Hadoop Oozie学习笔记 Oozie不支持Sqoop问题解决

Apache Sqoop和Spark(Apache Sqoop and Spark)

最满意答案

相关问答

用`sed` vs`tr`替换空字节(Replacing null bytes with `sed` vs `tr`)[2022-05-19]

将字节转换为兆字节(Converting bytes to megabytes)[2022-03-09]

将字节格式设置为千字节，兆字节，千兆字节(Format bytes to kilobytes, megabytes, gigabytes)[2022-03-19]

Sed / Awk搜索并替换/插入文件中的文本(Sed/Awk to search and replace/insert text in files)[2023-10-15]

如何在javascript中将千字节转换为兆字节[重复](how to convert kilobytes to megabytes in javascript [duplicate])[2023-08-14]

sed / awk替换句子特定位置的单词(sed/awk replace word in specific position of sentence)[2023-02-01]

sed / awk：用字节替换兆字节（插入零）(sed/awk: replace megabytes with bytes (insert zeros))[2023-04-21]

优化将字节转换为兆字节然后按desc排序的数据库查询(Optimise database query that is converting bytes to megabytes and then ordering by desc)[2022-08-13]

如何替换linux输出中的列数据 - 可能正在使用awk sed等(How to replace a column data in linux output - may be using awk sed etc)[2022-06-10]

用零替换字符串中的空格(Replace whitespace in string with zeros)[2022-04-04]

相关文章

最新问答