tshark导出FIX消息(tshark export FIX messages)
目标
我正在努力实现以下目标:
- 捕获包含FIX协议中的会话的网络流量
- 将来自网络流量的各个FIX消息提取为“漂亮”格式,例如CSV
- 对导出的“漂亮”格式数据进行一些数据分析
我通过以下方式实现了:
- 使用pcap捕获网络流量
- 使用tshark以CSV格式打印相关数据
- 使用Python(pandas)来分析数据
问题
问题是一些捕获的TCP数据包包含多个FIX消息,这意味着当我使用tshark导出到CSV时,我没有得到每行的FIX消息。 这使得消费CSV变得困难。
这是我用来提取相关FIX字段的tshark命令行,因为CSV是:
tshark -r dump.pcap \ -R \'(fix.MsgType[0]=="G" or fix.MsgType[0]=="D" or fix.MsgType[0]=="8" or \ fix.MsgType[0]=="F") and fix.ClOrdID != "0"\' \ -Tfields -Eseparator=, -Eoccurrence=l -e frame.time_relative \ -e fix.MsgType -e fix.SenderCompID \ -e fix.SenderSubID -e fix.Symbol -e fix.Side \ -e fix.Price -e fix.OrderQty -e fix.ClOrdID \ -e fix.OrderID -e fix.OrdStatus'
请注意,我正在使用“-Eoccurrence = l”来获取在数据包中出现多个字段的情况下最后一次出现的命名字段。 这不是一个可接受的解决方案,因为当数据包中有多个FIX消息时,信息将被丢弃。
这是我期望在导出的CSV文件中的每一行(来自一个FIX消息的字段)中看到的内容:
16.508949000,D,XXX,XXX,YTZ2,2,97480,34,646427,,
这是我在TCP数据包中有多个FIX消息(三个是这种情况)并且使用命令行标志“-Eoccurrence = a”时看到的:
16.515886000,F,F,G,XXX,XXX,XXX,XXX,XXX,XXX,XTZ2,2,97015,22,646429,646430,646431,323180,323175,301151,
问题
有没有办法(不一定使用tshark)从pcap文件中提取每个特定于协议的消息?
The Objective
I'm trying to achieve the following:
- capture network traffic containing a conversation in the FIX protocol
- extract the individual FIX messages from the network traffic into a "nice" format, e.g. CSV
- do some data analysis on the exported "nice" format data
I have achieved this by:
- using pcap to capture the network traffic
- using tshark to print the relevant data as a CSV
- using Python (pandas) to analyse the data
The Problem
The problem is that some of the captured TCP packets contain more than one FIX message, which means that when I do the export to CSV using tshark I don't get a FIX message per line. This makes consuming the CSV difficult.
This is the tshark commandline I'm using to extract the relevant FIX fields as CSV is:
tshark -r dump.pcap \ -R \'(fix.MsgType[0]=="G" or fix.MsgType[0]=="D" or fix.MsgType[0]=="8" or \ fix.MsgType[0]=="F") and fix.ClOrdID != "0"\' \ -Tfields -Eseparator=, -Eoccurrence=l -e frame.time_relative \ -e fix.MsgType -e fix.SenderCompID \ -e fix.SenderSubID -e fix.Symbol -e fix.Side \ -e fix.Price -e fix.OrderQty -e fix.ClOrdID \ -e fix.OrderID -e fix.OrdStatus'
Note that I'm currently using "-Eoccurrence=l" to get just the last occurrence of a named field in the case where there is more than one occurrence of a field in the packet. This is not an acceptable solution as information will get thrown away when there are multiple FIX messages in a packet.
This is what I expect to see per line in the exported CSV file (fields from one FIX message):
16.508949000,D,XXX,XXX,YTZ2,2,97480,34,646427,,
This is what I see when there is more than one FIX message (three is this case) in a TCP packet and the commandline flag "-Eoccurrence=a" is used:
16.515886000,F,F,G,XXX,XXX,XXX,XXX,XXX,XXX,XTZ2,2,97015,22,646429,646430,646431,323180,323175,301151,
The Question
Is there a way (not necessarily using tshark) to extract each individual, protocol specific message from a pcap file?
原文:https://stackoverflow.com/questions/13810156
最满意答案
MySQL
in
重复评估不相关的子查询时存在一个问题 ,就好像它们是相关的。 重写为连接是否会改善事物?SELECT COUNT(distinct p.`id`) FROM `poems` p JOIN `poems_genres` pg ON p.`id` = pg.`poem_id` WHERE pg.`genre_title` = 'derision' AND p.`status` = 'finished';
如果不是,那么根据这篇文章 (请参阅“如何强制内部查询先执行”一节 )将它包装在派生表中可能会有所帮助。
SELECT COUNT(*) FROM `poems` WHERE `id` IN ( select `poem_id` from ( SELECT `poem_id` FROM `poems_genres` WHERE `genre_title` = 'derision') x ) AND `status` = 'finished';
MySQL has a problem with
in
where it repeatedly re-evaluates uncorrelated sub queries as though they were correlated. Does rewriting as a join improve things?SELECT COUNT(distinct p.`id`) FROM `poems` p JOIN `poems_genres` pg ON p.`id` = pg.`poem_id` WHERE pg.`genre_title` = 'derision' AND p.`status` = 'finished';
If not then according to this article (see the section "How to force the inner query to execute first") wrapping it up in a derived table might help.
SELECT COUNT(*) FROM `poems` WHERE `id` IN ( select `poem_id` from ( SELECT `poem_id` FROM `poems_genres` WHERE `genre_title` = 'derision') x ) AND `status` = 'finished';
相关问答
更多-
我想到了。 我意识到$zip->extractTo($zipFile->get("uploadedFilePath"))试图为循环的每次迭代提取650个文件,这是650次。 我只是将提取代码移到循环外部并且脚本快速执行。 I figured it out. I realized that $zip->extractTo($zipFile->get("uploadedFilePath")) was attempting to extract 650 files for each iteration of th ...
-
SQL查询需要太长时间(SQL query takes way too long)[2024-01-15]
一些建议: GROUP BY子句中的列是否已编入索引? 如果没有,那么这将减慢查询速度。 “ID”列是否标记为主键? 如果不是那么他们应该。 在许多现代RDBMS中,标记为主键的列是自动索引的 你在外键上指定了索引吗? 那是a.DistID,d.rankID等。如果没有,那么索引你的FK列将加速查询 使用返回表的函数可能不是一个好主意。 如果在SQL Server中执行此操作,则查询优化器无法优化查询的该部分。 希望这可以帮助。 Some suggestions: Are the columns in th ... -
Sql server需要很长时间才能使更新查询执行少量行(Sql server takes too long time for update query to execute for few rows)[2022-02-28]
你有扳机吗? 在这里也看到我的答案: 为什么UPDATE比SELECT要花费更长的时间? Do you have a trigger? And see my answers here too: Why does an UPDATE take much longer than a SELECT? -
也许你的子查询如果在外部查询中触发每一行。 你可以在没有子查询的情况下重写它: SELECT COUNT(*), a.col3 FROM a INNER JOIN c ON c.col6 = a.col2 INNER JOIN d ON d.x = c.col2 AND d.x = a.col3 WHERE a.col10 = 20 --AND a.col2 IS NOT NULL --AND a.col3 IS NOT NULL ...
-
试试这个在联接中加入条件 SELECT v.LinkID, r.SourcePort, r.DestPort, r.NoOfBytes, r.StartTime , r.EndTime, r.Direction, r.nFlows FROM LINK_TBL v INNER JOIN NODEIF_TBL n ON (n.NodeNumber=v.orinodenumber ) INNER JOIN RAW_TBL r ON (r.RouterIP=n.ifipaddress and v.oriIf ...
-
嵌套的SQL查询需要很长时间(Nested SQL query takes too long)[2023-09-29]
MySQL in重复评估不相关的子查询时存在一个问题 ,就好像它们是相关的。 重写为连接是否会改善事物? SELECT COUNT(distinct p.`id`) FROM `poems` p JOIN `poems_genres` pg ON p.`id` = pg.`poem_id` WHERE pg.`genre_title` = 'derision' AND p.`status` = 'finished'; 如果不是,那么根据这篇文章 (请参阅“如何强制内部查询先执行”一节 ) ... -
sql oracle如何知道查询的哪一部分需要很长时间(sql oracle how to know which part of the query takes long time)[2023-02-12]
使用EXPLAIN PLAN查找执行工作流程,以及查询中消耗最多的部分。 例如,如果您使用的是SQL Developer,则可以运行以下命令: explain plan for然后运行: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); 它将以表格格式给出结果,如下所示: 您可以查看时间和成本以及操作类型。 例如,每当您看到“全表扫描”时,这是一个红色标记。 Use EXPLAIN PLAN to find out the execution ... -
如果您需要提高性能,请确保您有一个正确的索引表mediums列kleding_id CREATE INDEX my_index ON mediums (kleding_id); 记住,限制(对于不是最近的db版本)通常在结果..a上工作,并且在达到前100后不会中断 If you need improve performance be sure you have a proper index table mediums column kleding_id CREATE INDEX my_index ...
-
我以前见过这个,并通过使用参数表而不是变量来解决它。 if object_id('myParameters') is not null drop table myParameters Select cast('1996-05-01' as datetime) as myDate into myParameters Select * from TempTable where effdate = (select max(myDate) from myParameters) I've seen this be ...
-
请根据您的两台服务器仔细检查并观察这些点,并相互检查: 配置数据库实例 查询数据库/表的设置 整体工艺性能 系统内存(当发生延迟时,是否使用HDD扩展RAM?) 硬盘设置/状态(慢/缺陷/碎片硬盘也会导致这样的问题) 尝试命令“ALTER DATABASE SET PARAMETERIZATION FORCED GO” 使用DBCC MEMORYSTATUS进行深入分析 我可能的深度分析的首选文档用作概述: http : //sqlserverplanet.com/troubleshooting/sql-s ...