首页 \ 问答 \ 为什么没有'hadoop fs -head'shell命令？(Why is there no 'hadoop fs -head' shell command?)

为什么没有'hadoop fs -head'shell命令？(Why is there no 'hadoop fs -head' shell command?)

 在HDFS上检查文件的快速方法是使用尾部 ：  
~$ hadoop fs -tail /path/to/file
 
 这显示文件中最后一千字节的数据，这是非常有用的。 但是，相反的命令head似乎不是shell命令集合的一部分。 我觉得这很奇怪。  
 我的假设是，由于HDFS是为非常大的文件上的非常快速的流读取而构建的，因此存在着一些面向访问的问题。 这让我犹豫着去做头脑。 有人有答案吗？ 

A fast method for inspecting files on HDFS is to use tail: 
~$ hadoop fs -tail /path/to/file
 
This displays the last kilobyte of data in the file, which is extremely helpful. However, the opposite command head does not appear to be part of the shell command collections. I find this very surprising.  
My hypothesis is that since HDFS is built for very fast streaming reads on very large files, there is some access-oriented issue that affects head. This makes me hesitant to do things to access the head. Does anyone have an answer?

原文：https://stackoverflow.com/questions/19778137

更新时间：2023-09-04 17:09

最满意答案

 
  如果我们把所有这些都移到GitHub上，每个用户是否必须下载250MB，或者他们是否必须下载1GB或更多才能获得存储库的完整历史记录？  
 
 当第一次克隆时，每个用户都必须检索整个存储库。 但是，git服务器端实现会将存储库的“压缩”版本作为packfile发送 。 因此传输的数据的重量远小于1Gb。  
 每个连续的提取/拉取操作只会检索服务器知道的并且尚未在客户端的本地存储库中的新git对象（提交，树和Blob）。 那些也将作为packfile通过线路发送。  
 
 尽管@akonsu在说明您可以克隆存储库的浅版本时是正确的（即没有整个历史记录），但这会阻止用户与GitHub托管的主上游存储库进一步交互。  
 实际上， git clone文档指出： “一个浅的存储库有许多限制（你不能克隆或获取它，也不能从中推送或推入它）” 

 
 If we moved all this over to GitHub, would each user have to download the 250MBs or would they have to download 1GB or more to get the full history of the repository? 
 
Each of the users, when cloning for the first time, would have to retrieve the whole repository. However, git server side implementation would send a "compressed" version of the repository as a packfile. So the transmitted data would weight much less than 1Gb. 
Each successive fetch/pull operation would only retrieve the new git objects (Commits, Trees and Blobs) that the server knows about and that are not already on the client's local repository. Those would also be sent over the wire as a packfile. 
 
Although @akonsu is correct when stating you can clone a shallow version of your repository (ie. without the whole history), that would prevent the user from further interacting with a GitHub hosted main upstream repository.  
Indeed the git clone documentation states: "A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it)"

为什么没有'hadoop fs -head'shell命令？(Why is there no 'hadoop fs -head' shell command?)

最满意答案

相关问答

gitpitch如何使用私有github存储库？(How does gitpitch work with private github repositories?)[2023-04-21]

github上有多个存储库(Multiple repositories on github)[2021-11-05]

GitHub能很好地处理大型存储库吗？(Does GitHub handle large repositories well?)[2023-01-30]

GitHub - 如何创建子存储库？(GitHub - how can I create sub repositories?)[2023-09-27]

GitHub：从“您的存储库”中删除存储库？(GitHub: Remove a repository from “Your Repositories”?)[2023-06-04]

无法通过github API获取所有存储库(Not able to get all repositories through github API)[2021-08-23]

在Github中管理私有存储库(Managing private repositories in Github)[2022-11-23]

Github API：如何按星数排序公共存储库？(Github API: How to sort public repositories by count of stars?)[2021-09-12]

如何配置gogs以快速加载大型git存储库？(How can gogs be configured to load large git repositories fast?)[2022-10-04]

Github API不会返回所有存储库(Github API does not return all repositories)[2022-04-27]

相关文章

最新问答