首页 \ 问答 \ 根据未知模式匹配不匹配的字符串(Matching unmatched strings based on a unknown pattern)

根据未知模式匹配不匹配的字符串(Matching unmatched strings based on a unknown pattern)

好吧,我真的很伤脑筋,而且我很好奇,如果你们能给我任何指向我应该采取的正确方向的指示。

情况是这样的:

比方说,我有一个字符串集合(要明确这个字符串的模式是未知的,事实上,我可以说字符串只包含ASCII表中的符号,因此我不必担心关于奇怪的中国标志)。

在这个例子中,我将下面的字符串集合(请注意,字符串不必做任何人类感觉,所以不要试图找出它们:)):

"[001].[FOO].[TEST] - 'foofoo.test'",  
"[002].[FOO].[TEST] - 'foofoo.test'",  
"[003].[FOO].[TEST] - 'foofoo.test'",  
"[001].[FOO].[TEST] - 'foofoo.test.sample'",  
"[002].[FOO].[TEST] - 'foofoo.test.sample'",    
"-001- BAR.[TEST] - 'bartest.xx1",  
"-002- BAR.[TEST] - 'bartest.xx1"  

现在,我需要的是找到这些字符串的逻辑组(和子组),所以在上面的例子中,只需通过理性思考,就可以将前3个,后2个和后2个也可以将前5个组合成一个主组和2个子组,这应该是这样的:

{
    {
        "[001].[FOO].[TEST] - 'foofoo.test'",  
        "[002].[FOO].[TEST] - 'foofoo.test'",  
        "[003].[FOO].[TEST] - 'foofoo.test'",  
    }
    {
        "[001].[FOO].[TEST] - 'foofoo.test.sample'",  
        "[002].[FOO].[TEST] - 'foofoo.test.sample'",    
    }
}
{
    {
        "-001- BAR.[TEST] - 'bartest.xx1",  
        "-002- BAR.[TEST] - 'bartest.xx1"  
    }
}

对不起,上面的布局,但缩进4空格似乎并不正确(或我frakk'n它)。

无论如何,我不知道如何解决这个问题(如何获得如上所示的结果)。

首先,我想创建一个庞大的正则表达式来分析大多数已知的模式,但不同模式的数量是巨大的,这是不现实的。

另一个想法是我想解析一个字符串中的每个单词(所以去掉所有非字母或数字字符并按这些字符拆分),如果X%匹配,我可以假设这些字符串属于同一个组。 (其中X可能在80/90左右)。 不过,我觉得这个投机领域有点大。 例如,当匹配每20个单词的字符串时,击中80%以上的变化有点大(即4个单词可以不同),但是只匹配8个单词时,最多可以有2个单词不同。

我的问题是,在上述情况下,什么是合乎逻辑的方法?

至于再生的例子:

提前致谢!


Alright guys, I really hurt my brain over this one and I'm curious if you guys can give me any pointers towards the right direction I should be taking.

The situation is this:

Lets say, I have a collection of strings (let it be clear that the pattern of this strings is unknown. For a fact, I can say that the string contain only signs from the ASCII table and therefore, I don't have to worry about weird Chinese signs).

For this example, I take the following collection of strings (note that the strings don't have to make any human sense so don't try figuring them out :)):

"[001].[FOO].[TEST] - 'foofoo.test'",  
"[002].[FOO].[TEST] - 'foofoo.test'",  
"[003].[FOO].[TEST] - 'foofoo.test'",  
"[001].[FOO].[TEST] - 'foofoo.test.sample'",  
"[002].[FOO].[TEST] - 'foofoo.test.sample'",    
"-001- BAR.[TEST] - 'bartest.xx1",  
"-002- BAR.[TEST] - 'bartest.xx1"  

Now, what I need to have is a way of finding logical groups (and subgroups) of these set of strings, so in the above example, just by rational thinking, you can combine the first 3, the 2 after that and the last 2. Also the resulting groups from the first 5 can be combined in one main group with 2 subgroups, this should give you something like this:

{
    {
        "[001].[FOO].[TEST] - 'foofoo.test'",  
        "[002].[FOO].[TEST] - 'foofoo.test'",  
        "[003].[FOO].[TEST] - 'foofoo.test'",  
    }
    {
        "[001].[FOO].[TEST] - 'foofoo.test.sample'",  
        "[002].[FOO].[TEST] - 'foofoo.test.sample'",    
    }
}
{
    {
        "-001- BAR.[TEST] - 'bartest.xx1",  
        "-002- BAR.[TEST] - 'bartest.xx1"  
    }
}

Sorry for the layout above but indenting with 4 spaces doesn't seem to work correctly (or I'm frakk'n it up).

Anyway, I'm not sure how to approach this problem (how to get the result desired as indicated above).

First of, I thought of creating a huge set of regexes which would parse most known patterns but the amount of different patterns is just to huge that this isn't realistic.

Another think I thought of was parsing each individual word within a string (so strip all non alphabetic or numeric characters and split by those), and if X% matches, I can assume the strings belong to the same group. (where X will probably be around 80/90). However, I find the area of speculation kinda big. For example, when matching strings with each 20 words, the change of hitting above 80% is kinda big (that means that 4 words can differ), however when matching only 8 words, 2 words at most can differ.

My question to you is, what would be a logical approach in the above situation?

As for a reallife example:

Thanks in advance!


原文:https://stackoverflow.com/questions/2571226
更新时间:2023-09-13 20:09

最满意答案

您应该将tickInterval设置为24 * 3600 * 1000(以milisecods为单位)。

xAxis: {
   tickInterval: 24 * 3600 * 1000
}

You should set the tickInterval as 24 * 3600 * 1000 (one day in milisecods).

xAxis: {
   tickInterval: 24 * 3600 * 1000
}

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)