首页 \ 问答 \ 如何使用Java有效地读取Hadoop（HDFS）文件中的第一行？(How to read first line in Hadoop (HDFS) file efficiently using Java?)

如何使用Java有效地读取Hadoop（HDFS）文件中的第一行？(How to read first line in Hadoop (HDFS) file efficiently using Java?)

 我的Hadoop集群上有一个大的CSV文件。 该文件的第一行是“标题”行，由字段名称组成。 我想对这个标题行进行操作，但我不想处理整个文件。 另外，我的程序是用Java编写的，并使用Spark。  
 在Hadoop集群上只读取大型CSV文件的第一行的有效方法是什么？ 

I have a large CSV file on my Hadoop cluster. The first line of the file is a 'header' line, which consists of field names. I want to do an operation on this header line, but I do not want to process the whole file. Also, my program is written in Java and using Spark.  
What is an efficient way to read just the first line of a large CSV file on an Hadoop cluster?

原文：https://stackoverflow.com/questions/21188788

更新时间：2022-11-06 19:11

最满意答案

Yes, it is perfectly valid have non binary features. 
 
 由于sigmoid函数的性质，输出介于0和1之间，没有什么可以阻止你使用非二进制特征集。  
 预测必须是二进制的吗？  
Yes, you can have multiclass logistic classification as well.
 
 最简单的方法是解决一对一的分类问题，其中为每个标签训练一个二元逻辑分类器。  
 例如。 如果您的预测空间跨越（1,2,3,4），则可以有4个逻辑分类器。  
 给定测试集中的任何一点，您可以给它对应于最有信心的分类器的标签（即该测试点的得分最高）。 

Yes, it is perfectly valid have non binary features. 
 
The output falls between 0 and 1 because of the nature of the sigmoid function, there is nothing that stops you from having non binary feature set. 
Do the predictions have to be binary? 
Yes, you can have multiclass logistic classification as well.
 
The simplest way of doing that is solving a one-vs-all classification problem, wherein you train one binary logistic classifier for each of the labels. 
For example. if your prediction space spans (1, 2, 3, 4), you can have 4 logistic classifiers. 
Given any point in the test set, you can give it the label corresponding to the classifier which is most confident (i.e. has the highest score for that test point).

相关问答

Stepmo函数与Sigmoid函数(Step function versus Sigmoid function)[2022-06-04]

（Heaviside）阶梯函数通常仅在单层感知器中有用，早期类型的神经网络可用于输入数据可线性分离的情况下的分类。然而，多层神经网络或多层感知器更受关注，因为它们是通用函数逼近器，它们能够区分非线性可分的数据。使用反向传播训练多层感知器。反向传播的要求是一种可区分的激活功能。这是因为反向传播使用此函数的梯度下降来更新网络权重。 Heaviside阶跃函数在x = 0时是不可微分的，其他阶段的导数是0 。这意味着梯度下降将无法在更新权重方面取得进展，反向传播将失败。 S形或逻辑函数没有这个缺点，这 ...
python中的sigmoid函数(sigmoid function in python)[2023-05-01]

您正在使用常规的浮点数字，它只能保存15或16位有效数字。当你评估math.e**-37 ，结果是 8.533047625744083e-17 当您将其添加到一个时，您可能想要获得 1.00000000000000008533047625744083 但计算机实际上删除了除前16位数字外的所有数字并给出 1.000000000000000 这只是1 。实际上，给1添加1e-16只会给1 。当您添加1e-15时，您确实获得了一种以外的东西，但比您尝试的要大。有几种方法可以获得你想要的东西。一种 ...
在mathematica中绘制sigmoid函数(Plotting sigmoid function in mathematica)[2023-02-16]

我希望你写的时候 sigmoid_f[x_, a_, b_] := 1/(1 - ae^-bx) 你的意思是写 sigmoidf[x_, a_, b_] := 1/(1 - a*E^(-b*x)) 其中E是Euler数的内置表示， *是乘法运算符的通常文本形式。另外，正如@Alan所评论的那样，不要在您定义的对象的名称中使用_ 。 Mathematica对案件和标点符号非常特别。在原始表达式中， ae和bx都是（可能是未知的）对象的名称。 I expect that when you write s ...
sigmoid函数 - TypeError(sigmoid function - TypeError)[2023-01-02]

删除**，它将被修复 np.exp里面有power函数，这就是你得到错误的原因 Remove the ** and it will be fixed np.exp has the power function inside of it that's why you get an error
Keras二进制分类 - Sigmoid激活函数(Keras Binary Classification - Sigmoid activation function)[2022-09-14]

二进制分类的输出是样本属于类的概率。 Keras如何区分二进制分类问题中sigmoid的使用或回归问题？它不需要。它使用损失函数计算损失，然后使用导数并更新权重。换一种说法：在培训期间，框架最小化了损失。用户必须指定损失函数（由框架提供）或提供自己的。网络只关心此函数输出的标量值，其2个参数预测为y^和实际y 。每个激活功能实现前向传播和反向传播功能。该框架仅对这两个功能感兴趣。它并不关心函数究竟做了什么。唯一的要求是激活函数是非线性的。 The output of a binary c ...
x的Sigmoid是1(Sigmoid of x is 1)[2022-07-12]

这个问题的答案显然取决于上下文。这意味着什么“好”。 S形激活函数将产生0到1之间的输出。因此，它们是用于二进制分类的标准输出激活，您希望您的神经网络输出介于0和1之间的数字 - 输出被解释为你的输入在指定的类中。但是，如果您在整个神经网络（即中间层）中使用S形激活函数，则可以考虑切换到RELU激活函数。历史上，在整个神经网络中使用S形激活函数作为引入非线性的一种方式，以使得神经网络可以做比接近线性函数更多的功能。然而，人们发现sigmoid激活严重受到消失梯度问题的严重影响，因为函数的平坦程度远不 ...
输出层的softmax和sigmoid函数(softmax and sigmoid function for the output layer)[2023-09-02]

当需要概率分布时， softmax()有所帮助，总和可达1.当您希望输出范围从0到1，但不需要总和为1时，可以使用sigmoid 。在你的情况下，你希望分类和选择两种选择。我会建议使用softmax()因为您会得到一个可以应用交叉熵损失函数的概率分布。 softmax() helps when you want a probability distribution, which sums up to 1. sigmoid is used when you want the output to be ra ...
多类分类和sigmoid函数(Multiclass classification and the sigmoid function)[2023-08-09]

Yes, it is perfectly valid have non binary features. 由于sigmoid函数的性质，输出介于0和1之间，没有什么可以阻止你使用非二进制特征集。预测必须是二进制的吗？ Yes, you can have multiclass logistic classification as well. 最简单的方法是解决一对一的分类问题，其中为每个标签训练一个二元逻辑分类器。例如。如果您的预测空间跨越（1,2,3,4），则可以有4个逻辑分类器。给定测试集中 ...
在Logistic回归中，sigmoid函数真的很重要吗？(Does the sigmoid function really matter in Logistic Regression?)[2023-03-16]

你是否也改变了训练中的功能，或者你只是使用相同的训练方法然后将sigmoid改为tanh？我认为很可能发生的事情如下。看看sigmoid和tanh的图表： sigmoid： http：//www.wolframalpha.com/input/？i = plot + sigmoid％ 28x％29 + for+x％3D％28-1％2C + 1％29 tanh： http：//www.wolframalpha。 COM /输入/ΣI=情节+的tanh％28X％29 +为+ X％3D％28-1％2C + 1％ ...
关于反向传播和sigmoid功能(about backpropagation and sigmoid function)[2022-07-10]

正如Kelu所说，等式的一部分是基于传递函数的导数（在这种情况下是sigmoid）。要了解为什么需要衍生产品，您需要了解增量规则的工作原理（*）：您的总体目标是使用梯度下降最小化网络输出中的错误。梯度下降本身试图通过采用与梯度的负值成比例的步长来找到误差函数（E）中的最小值。梯度只是导数而且你在数学上使用导数的原因是梯度指向（误差）函数的最大增长率的方向。结论：由于您希望最小化误差，因此您采用与渐变相反的方式。这是使用渐变的直观原因。如果你想要数学推导，你应该查看这个基本的wiki文章（ ...

Hadoop HDFS Wrong FS: hdfs:/ expected file:///

【HDFS】HADOOP DISTRIBUTED FILE SYSTEM

Java 流(Stream)、文件(File)和IO

Spark连接Hadoop读取HDFS问题小结

Hadoop 学习总结之一：HDFS简介

Hadoop HDFS之SequenceFile和MapFile

Hadoop HDFS 配置挂载HDFS文件系统

Hadoop HDFS 配置

Hadoop分析之一HDFS元数据解析

Hadoop HDFS源码学习笔记（二）

如何使用Java有效地读取Hadoop（HDFS）文件中的第一行？(How to read first line in Hadoop (HDFS) file efficiently using Java?)

最满意答案

相关问答

Stepmo函数与Sigmoid函数(Step function versus Sigmoid function)[2022-06-04]

python中的sigmoid函数(sigmoid function in python)[2023-05-01]

在mathematica中绘制sigmoid函数(Plotting sigmoid function in mathematica)[2023-02-16]

sigmoid函数 - TypeError(sigmoid function - TypeError)[2023-01-02]

Keras二进制分类 - Sigmoid激活函数(Keras Binary Classification - Sigmoid activation function)[2022-09-14]

x的Sigmoid是1(Sigmoid of x is 1)[2022-07-12]

输出层的softmax和sigmoid函数(softmax and sigmoid function for the output layer)[2023-09-02]

多类分类和sigmoid函数(Multiclass classification and the sigmoid function)[2023-08-09]

在Logistic回归中，sigmoid函数真的很重要吗？(Does the sigmoid function really matter in Logistic Regression?)[2023-03-16]

关于反向传播和sigmoid功能(about backpropagation and sigmoid function)[2022-07-10]

相关文章

最新问答