Hadoop for MySQL用例(Hadoop for MySQL use cases)
我有一个包含美国股票,共同基金和ETF价格约4百万条记录的数据库,并且每天我都会为每个证券添加每日价格。
对于我正在处理的一个功能,我需要获取每个安全性的最新价格(分组最大值)并使用其他财务指标进行计算。 证券数量约为40K。
但是这个数据量的分组最大值很大,需要几分钟才能执行。
当然,我的表使用索引,但任务涉及获取和实时处理近7GB的数据。
所以我很感兴趣,这是大数据工具和算法的任务,还是少量的数据? 因为在例子中我注意到他们正在处理数千和数百GB的数据。
我的数据库是MySQL,我想用Hadoop来处理数据。 是好的做法还是我只需要使用MySQL优化(我的数据很小?),或者如果在该数据量中使用Hadoop是错误的,那么您对此案例有何建议?
请注意 ,我每天增加的项目涉及许多分析,需要根据用户请求实时完成。
注意不知道这个问题是否可以在计算器中查询,所以如果问题不在话题上,请抱歉。
提前致谢!
I have a database with ~4 million records of US stocks, mutual funds and ETFs prices for 5 years and every day I am adding daily price for each security.
For one feature that I am working on I need to fetch latest price for each security (groupwise max) and do some calculation with other financial metrics. The securities count is ~40K.
But the groupwise maximum with this amount of data is heavy and takes minutes to execute.
Of course my tables use indexes, but the task involves getting and real time processing nearly 7GB data.
So I am interested, is this task for Big Data tools and algorithms or it is small amount of data? because in examples I noticed that they are working on data of thousands and millions GBs.
My database is MySQL and I want to use Hadoop to process data. Is it good practice or I need to use only MySQL optimizations (is my data small?) or if it is wrong to use Hadoop in that amount of data, what can you advice for this case?
NOTE that my increasing every day and project involves many analyzes, that need to be done on real time, based on user request.
NOTE Don't know whether this question is OK to ask in stackoverflow, so please sorry if question is off-topic.
Thanks in advance!
原文:https://stackoverflow.com/questions/46915388