知识点
相关文章
更多最近更新
更多Vivek's blog - Roll your own autocomplete solution using Tries.
2019-03-27 01:07|来源: 网路
Vivek's blog - Roll your own autocomplete solution using Tries.
Roll your own autocomplete solution using Tries.
You might have come across many websites with autocomplete suggestions, most notably Google.
Adding such an option to your site or application might seem daunting but there is a very simple recursive data structure that solves the problem. There is a ton of literature on the net on how to do this using black box approaches like Lucene, Solr, Sphinx, Redis etc. But all these packages require a lot of configuration and you also lose flexibility. Tries can be implemented in a few lines of code in any language of your choice.
A trie is basically a tree, with each node representing a letter as illustrated in the figure above. Words are paths along this tree and the root node has no characters associated with it.
- The value of each node is the path or the character sequence leading upto it.
- The children at each node are ideally represented using a hash table mapping the next character to the child nodes.
- Its also useful to set a flag at every node to indicate whether a word/phrase ends there.
Now we can define methods to insert a string and to search for one.
The insertion cost has a linear relationship with the string length. Now lets define the methods for listing all the strings which start with a certain prefix. The idea is to traverse down to the node representing the prefix and from there do a breadth first search of all the nodes that are descendants of the prefix node. We check if the node is at a word boundary from the flag we defined earlier and append the node’s value to the results. A caveat of the recursive approach, is you might run into stack depth problems when the maximum length of a string reaches tens of thousands of characters.class Trie ( object ):def __init__ ( self , value = None ):self . children = {}self . value = valueself . flag = False # Flag to indicate that a word ends at this nodedef add ( self , char ):val = self . value + char if self . value else charself . children [ char ] = Trie ( val )def insert ( self , word ):node = selffor char in word :if char not in node . children :node . add ( char )node = node . children [ char ]node . flag = Truedef find ( self , word ):node = selffor char in word :if char not in node . children :return Nonenode = node . children [ char ]return node . value
class Trie ( object ):...def all_prefixes ( self ):results = set ()if self . flag :results . add ( self . value )if not self . children : return resultsreturn reduce ( lambda a , b : a | b ,[ node . all_prefixes () fornode in self . children . values ()]) | resultsdef autocomplete ( self , prefix ):node = selffor char in prefix :if char not in node . children :return set ()node = node . children [ char ]return node . all_prefixes ()To delete an item, first traverse down to the leaf of the keyword and then work backwards, checking for common paths.
And there you have it, an efficient way of retrieving strings with a common prefix in less than 40 lines of python code. I also wrote a C++ version using the STL data structures in about 90 lines, the time complexity for this version however is O(n log n) as the STL uses a red-black tree implementation for an associative array. Colin Dean has wrote a Ruby version and Marcus McCurdy, a Java one. You can find them all over here - https://github.com/vivekn/autocomplete. Read more about tries at Wikipedia.
Update: Thanks to bmahler on reddit for pointing out that the wordlist approach was unnecessary and space inefficient. It has been corrected.
Vivek's blog - Roll your own autocomplete solution using Tries.
Roll your own autocomplete solution using Tries.
You might have come across many websites with autocomplete suggestions, most notably Google.
Adding such an option to your site or application might seem daunting but there is a very simple recursive data structure that solves the problem. There is a ton of literature on the net on how to do this using black box approaches like Lucene, Solr, Sphinx, Redis etc. But all these packages require a lot of configuration and you also lose flexibility. Tries can be implemented in a few lines of code in any language of your choice.
A trie is basically a tree, with each node representing a letter as illustrated in the figure above. Words are paths along this tree and the root node has no characters associated with it.
- The value of each node is the path or the character sequence leading upto it.
- The children at each node are ideally represented using a hash table mapping the next character to the child nodes.
- Its also useful to set a flag at every node to indicate whether a word/phrase ends there.
Now we can define methods to insert a string and to search for one.
The insertion cost has a linear relationship with the string length. Now lets define the methods for listing all the strings which start with a certain prefix. The idea is to traverse down to the node representing the prefix and from there do a breadth first search of all the nodes that are descendants of the prefix node. We check if the node is at a word boundary from the flag we defined earlier and append the node’s value to the results. A caveat of the recursive approach, is you might run into stack depth problems when the maximum length of a string reaches tens of thousands of characters.class Trie ( object ):def __init__ ( self , value = None ):self . children = {}self . value = valueself . flag = False # Flag to indicate that a word ends at this nodedef add ( self , char ):val = self . value + char if self . value else charself . children [ char ] = Trie ( val )def insert ( self , word ):node = selffor char in word :if char not in node . children :node . add ( char )node = node . children [ char ]node . flag = Truedef find ( self , word ):node = selffor char in word :if char not in node . children :return Nonenode = node . children [ char ]return node . value
class Trie ( object ):...def all_prefixes ( self ):results = set ()if self . flag :results . add ( self . value )if not self . children : return resultsreturn reduce ( lambda a , b : a | b ,[ node . all_prefixes () fornode in self . children . values ()]) | resultsdef autocomplete ( self , prefix ):node = selffor char in prefix :if char not in node . children :return set ()node = node . children [ char ]return node . all_prefixes ()To delete an item, first traverse down to the leaf of the keyword and then work backwards, checking for common paths.
And there you have it, an efficient way of retrieving strings with a common prefix in less than 40 lines of python code. I also wrote a C++ version using the STL data structures in about 90 lines, the time complexity for this version however is O(n log n) as the STL uses a red-black tree implementation for an associative array. Colin Dean has wrote a Ruby version and Marcus McCurdy, a Java one. You can find them all over here - https://github.com/vivekn/autocomplete. Read more about tries at Wikipedia.
Update: Thanks to bmahler on reddit for pointing out that the wordlist approach was unnecessary and space inefficient. It has been corrected.
转自:http://www.cnblogs.com/lexus/archive/2012/03/05/2379830
相关问答
更多-
add() - 几乎总是如你所说 roll() - 例如你想在一个月内“分配”事件。 该算法可能会进行几天,然后放置事件,然后继续进行。 当月底到达时,应该从一开始就重新开始。 因此roll() 。 add() - almost always, as you said roll() - for example you want to "dispense" events in one month. The algorithm may be to proceed a number of days and pla ...
-
您需要进行一些分页才能访问所有照片。 基本上,您将它们装入块中,并在每次获取后跟踪您停止的位置。 你会想要一个类似的状态: this.state = { dataSource: ds.cloneWithRows([]), assets: [], lastCursor: null, noMorePhotos: false, loadingMore: false, }; 然后获取与这些类似的功能。 此示例假定您使用ListView使用ListView显示照片 tryPhotoLoad() ...
-
不幸的是,这是不可能的。 Photo-Library的工作方式是“自己的相册”中的每张照片都只是对保存到相机胶卷的实际照片的引用。 干杯, 亨德里克 unfortunately this is not possible. The Photo-Library works in the way that every photo in an "own photo album" is only a reference to the actual photo that is saved to the Camera R ...
-
唯一可靠的长期/可扩展解决方案是将博客永久托管在子域或不同域上,并将重定向从mydomain.com/blog添加到新位置(即:blog.mydomain.com)。 您需要在mydomain.com上运行像Apache / nginx这样的前端的单个服务器来提供像Rails和Wordpress这样的混合后端,这在Heroku上是不可能的。 可悲的是,这是您需要深入了解顾问并对您的客户严格控制技术限制的地方。 为什么客户要迁移到Heroku? 有没有一个更大的目标,你可以通过控制前端的不同主机完成,并可以混 ...
-
我找到了我正在寻找的旋转矩阵。 我使用欧拉角度(滚动,俯仰,偏航)进行俯仰和滚转。 当手机结束90度时,x和z平原相同,手机发疯,这是欧拉角的根本缺陷。 我需要通过getRotationMatrix使用旋转矩阵来获取俯仰和滚转度数 这里是所有的;) XML:
Ruby on Rails博客(Ruby on Rails Blog)[2022-01-14]
尝试使用Mephisto PS:在谷歌搜索“rails博客引擎”,你会发现很多其他的点击。 Try using Mephisto PS: Search for "rails blog engine" in Google and you will find many other hits._renderItem函数应该是autocomplete选项的一部分: $( "#tut_search" ).autocomplete({ minLength: 2, source: "PHP_Code/MyAjax.php?page=tut_search_ac", _renderItem: function(ul, item) { return $("") .data("item.autocomplete", item) ...在.htaccess中使用此代码来实现: Options +FollowSymLinks -MultiViews RewriteEngine on RewriteCond %{QUERY_STRING} !^$ RewriteRule ^blog/?$ / [R=301,L,QSA] Use this code in your .htaccess to achieve that: Options +FollowSymLinks -MultiViews RewriteEngine on RewriteC ...解析博客文章(Parsing blog post)[2023-04-13]
您可以使用RSS源解析所有数据,并创建一个特定的屏幕以您喜欢的方式显示解析的数据。 You can parse all the data using the RSS feeds and create a specific screen to display this parsed data the way you like.Swift脚本自动完成(Swift Scripting Autocomplete)[2022-02-22]
如果您不需要引用其他文件,则应该尝试在XCode中使用游乐场。 这使您可以执行一些基本调试。 You should try using a playground in XCode if you don't need to reference other files. This lets you perform some basic debug as well.