Hadoop和MapReduce(Hadoop and MapReduce)
我是HDFS和MapReduce的新手,并试图计算调查统计数据。 输入文件采用以下格式:年龄点性别类别 - 所有4个都是数字。 这是正确的开始:
public static class MapClass extends MapReduceBase implements Mapper<IntWritable, IntWritable, IntWritable, IntWritable> { private final static IntWritable Age = new IntWritable(1) ; private IntWritable AgeCount = new IntWritable() ; public void map( Text key, Text value, OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException { AgeCount. set(Integer. parseInt(value. toString() ) ) ; output. collect(AgeCount, Age) ; } }
我的问题:1。这是一个正确的开始吗? 2.如果我想收集其他属性,如性,点 - 我会添加另一个output.collect语句吗? 我知道我必须阅读该行并分成属性。 3.它表示实现Mapper - 我使所有4 IntWritable都正确吗?
I am new to HDFS and MapReduce and trying to calculate survey statistics. Input file is in this format: Age Points Sex Category - all 4 of them are numbers. Is this the correct start:
public static class MapClass extends MapReduceBase implements Mapper<IntWritable, IntWritable, IntWritable, IntWritable> { private final static IntWritable Age = new IntWritable(1) ; private IntWritable AgeCount = new IntWritable() ; public void map( Text key, Text value, OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException { AgeCount. set(Integer. parseInt(value. toString() ) ) ; output. collect(AgeCount, Age) ; } }
My questions: 1. Is this a correct start? 2. If I want to collect for other attributes like Sex,Points - will I just add another output.collect statements? I know I have to read the line and split into attributes. 3. Where it says implements Mapper - I made all 4 IntWritable is it correct?
原文:https://stackoverflow.com/questions/5698693
最满意答案
如果是文件,则需要阅读。
所以使用
read.zoo()
作为你 - 但然后立即转换:gold <- as.xts(read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE))
好?
If it is a file, you need to read it.
So use
read.zoo()
as you -- but then convert rightaway:gold <- as.xts(read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE))
Ok?
相关问答
更多-
可以将类指定为“ANY” test = setRefClass( Class = "test", fields = c( edata = "ANY" ) ) 然后可以将“xts”对象分配给“edata”。 One can specify the class as "ANY" test = setRefClass( Class = "test", fields = c( edata = "ANY" ) ) Then one can assign an "xts" o ...
-
R中的xts的回归(regressions with xts in R)[2022-06-23]
dyn和dynlm包可以通过动物园对象来实现。 在dyn的情况下,只需写入dyn$lm而不是lm ,并将它传递给动物园对象而不是数据框。 请注意,xts中的滞后与通常的R约定相反,因此如果x是xts类,则如果x是zoo或ts类,则lag(x,1)与lag(x,-1)相同。 > library(xts) > library(dyn) > x <- xts(anscombe[c("y1", "x1")], as.Date(1:11)) # test data > dyn$lm(y1 ~ lag(x1, -(1: ... -
使用read.zoo和split参数: z <- read.zoo(dta, split = 1, index = 2) as.xts(z) Use read.zoo with the split argument: z <- read.zoo(dta, split = 1, index = 2) as.xts(z)
-
适用于R中的xts对象(apply to an xts object in R)[2022-10-28]
这将添加一列与每行的平均值 obj <- cbind(obj, rowMeans(obj)) 编辑澄清为什么cbind在这里工作: 这里我使用的是cbind.xts这是一个调用merge.xts 。 所以以上相当于: obj <- merge(obj, rowMeans(obj)) This will add a column with the mean of each row obj <- cbind(obj, rowMeans(obj)) EDIT clarify why the cbind is ... -
我不想这么说,但答案是否定的。 xts对象本质上是一个已按日期索引的矩阵。 你最接近的词就是列名。 I hate to say it, but the answer is no. An xts object is essentially a matrix which has been indexed by dates. The closest thing you'll have to words are the column names.
-
您可以使用lubridate包中的ymd函数将字符串转换为日期。 然后你可以使用tk_xts的timetk library(dplyr) library(timetk) library(lubridate) mydata %>% mutate(Date = ymd(Date)) %>% tk_xts(select = Sales) You can use the ymd function from the lubridate package to convert strings to dat ...
-
“xts函数读取前几个时间戳作为1970/01/01的起源日期”并不是真的。 xts 命令文件中的所有时间戳。 如果它们中的任何一个为零,它们将是xts对象中的第一个观测值。 我怀疑,您的CSV中的数据不是您所期望的。 在文件“split_ab.csv”中,行23669和23670的时间戳为0。 1442558305629290858 12247553 15025 8 7 15030 5 3 15020 12 11 15035 16 16 15015 20 18 15040 21 18 15010 27 2 ...
-
我相信您所需要的只是按日期加入每个列表元素。 但是,为此,首先需要将所有这些变量PX_LAST重命名为唯一的。 例如: require(data.table) for (i in 1:length(res)) { setnames(res[[i]],"PX_LAST",paste("PX_LAST",i,sep="_")) } 然后你可以通过成对merge或使用plyr::join_all函数加入: require(plyr) df <- join_all(res, by="date", type=" ...
-
R根据data.frame中的两列创建时间序列作为xts索引(R Create a time sequence as xts index based on two columns in data.frame)[2023-01-18]
在你找到一个可行的解决方案之后的一个多月我不知道这有多大用处,但我还是把你的代码缩小到更紧凑的东西。 library(dplyr) df <- structure(list(soc_sec = c("AA2105480", "AA2105480", "AB4378973", "AB4990257", "AB7777777", "AB7777777", "AB7777777", "AC4285291", "AC4285291", "AC6039874", "AC6039874", "AC6039874" ... -
如果是文件,则需要阅读。 所以使用read.zoo()作为你 - 但然后立即转换: gold <- as.xts(read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE)) 好? If it is a file, you need to read it. So use read.zoo() as you -- but then convert rightaway: gold <- as.xts(read.zoo("GOLD.CSV", ...