从c#中的搜索引擎获取链接(get links from search engines in c#)
首先请原谅我的英语破碎
我想首先编码元搜索引擎我尝试使用谷歌bing和雅虎api s但他们是有限的
然后我试图使用htmlagility包获得搜索引擎的结果链接
我有这个代码using HtmlAgilityPack; using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.IO; using System.Linq; using System.Net; using System.ServiceModel.Syndication; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.Xml; namespace Search { public partial class Form1 : Form { // load snippet HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument(); public Form1() { InitializeComponent(); } private void btn1_Click(object sender, EventArgs e) { listBox1.Items.Clear(); StringBuilder sb = new StringBuilder(); byte[] ResultsBuffer = new byte[8192]; string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length); if (count != 0) { tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count); sb.Append(tempString); } } while (count > 0); string sbb = sb.ToString(); HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument(); html.OptionOutputAsXml = true; html.LoadHtml(sbb); HtmlNode doc = html.DocumentNode; foreach (HtmlNode link in doc.SelectNodes("//a[@href]")) { //HtmlAttribute att = link.Attributes["href"]; string hrefValue = link.GetAttributeValue("href", string.Empty); if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://")) { int index = hrefValue.IndexOf("&"); if (index > 0) { hrefValue = hrefValue.Substring(0, index); listBox1.Items.Add(hrefValue.Replace("/url?q=", "")); } } } } }
}
我可以将此代码用于所有搜索引擎吗? 我更改了这些行,因此它适用于其他搜索引擎
if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
和
string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();
但它很有效
我的另一个问题是这段代码只返回第一页链接。如果我想返回N第一个链接,该怎么办?
有人可以帮忙吗?first of all excuse me for my broken english
i want to code a metasearch engine first i try to use google bing and yahoo api s but theye were limited
then i'm trying to use htmlagility pack to gain results link of search engines
i have this codeusing HtmlAgilityPack; using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.IO; using System.Linq; using System.Net; using System.ServiceModel.Syndication; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.Xml; namespace Search { public partial class Form1 : Form { // load snippet HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument(); public Form1() { InitializeComponent(); } private void btn1_Click(object sender, EventArgs e) { listBox1.Items.Clear(); StringBuilder sb = new StringBuilder(); byte[] ResultsBuffer = new byte[8192]; string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length); if (count != 0) { tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count); sb.Append(tempString); } } while (count > 0); string sbb = sb.ToString(); HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument(); html.OptionOutputAsXml = true; html.LoadHtml(sbb); HtmlNode doc = html.DocumentNode; foreach (HtmlNode link in doc.SelectNodes("//a[@href]")) { //HtmlAttribute att = link.Attributes["href"]; string hrefValue = link.GetAttributeValue("href", string.Empty); if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://")) { int index = hrefValue.IndexOf("&"); if (index > 0) { hrefValue = hrefValue.Substring(0, index); listBox1.Items.Add(hrefValue.Replace("/url?q=", "")); } } } } }
}
can i use this code for all search engines? i changed these lines so it work for other search engines
if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
and
string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();
but it dosent work
My other problem is that this code just return the first page links .what should i do if i want to return N first link?
anybody can help?
原文:https://stackoverflow.com/questions/37250808
最满意答案
你可以用任何语言在O(n)中做到这一点,基本上如下:
# Get min and max values O(n). min = oldList[0] max = oldList[0] for i = 1 to oldList.size() - 1: if oldList[i] < min: min = oldList[i] if oldList[i] > max: max = oldList[i] # Initialise boolean list O(n) isInList = new boolean[max - min + 1] for i = min to max: isInList[i] = false # Change booleans for values in old list O(n) for i = 0 to oldList.size() - 1: isInList[oldList[i] - min] = true # Create new list from booleans O(n) (or O(1) based on integer range). newList = [] for i = min to max: if isInList[i - min]: newList.append (i)
我在这里假设
append
是一个O(1)操作,除非实现者脑死亡,否则它应该是这样的。 所以每个O(n)有k步,你仍然有一个O(n)操作。无论这些步骤是否在您的代码中明确完成,或者它们是否在某种语言的封面下完成,都是无关紧要的。 否则,你可以声称C
qsort
是一个操作,你现在有一个O(1)排序例程的圣杯:-)正如许多人发现的那样,通常可以通过时间复杂性来折衷空间复杂性。 例如,上述仅适用于我们允许引入
isInList
和newList
变量。 如果这是不允许的,下一个最好的解决方案可能是对列表进行排序(可能不是更好的O(n log n)),然后是O(n)(我认为)操作来删除重复项。一个极端的例子是,您可以使用相同的额外空间方法在O(n)时间内对任意数量的32位整数(假设每个只有255个或更少的副本)进行排序, 前提是您可以分配大约40亿个字节存储计数 。
只需将所有计数初始化为零并遍历列表中的每个位置,然后根据该位置处的数字递增计数。 那是O(n)。
然后从列表的开始处开始并遍历count数组,将许多正确的值放入列表中。 那是O(1),当然1的数量大约是40亿,但仍然是恒定的时间:-)
这也是O(1)空间复杂性,但是非常大的“1”。 通常情况下,权衡并不那么严重。
You can do this in O(n) in any language, basically as:
# Get min and max values O(n). min = oldList[0] max = oldList[0] for i = 1 to oldList.size() - 1: if oldList[i] < min: min = oldList[i] if oldList[i] > max: max = oldList[i] # Initialise boolean list O(n) isInList = new boolean[max - min + 1] for i = min to max: isInList[i] = false # Change booleans for values in old list O(n) for i = 0 to oldList.size() - 1: isInList[oldList[i] - min] = true # Create new list from booleans O(n) (or O(1) based on integer range). newList = [] for i = min to max: if isInList[i - min]: newList.append (i)
I'm assuming here that
append
is an O(1) operation, which it should be unless the implementer was brain-dead. So with k steps each O(n), you still have an O(n) operation.Whether the steps are explicitly done in your code or whether they're done under the covers of a language is irrelevant. Otherwise you could claim that the C
qsort
was one operation and you now have the holy grail of an O(1) sort routine :-)As many people have discovered, you can often trade off space complexity for time complexity. For example, the above only works because we're allowed to introduce the
isInList
andnewList
variables. If this were not allowed, the next best solution may be sorting the list (probably no better the O(n log n)) followed by an O(n) (I think) operation to remove the duplicates.An extreme example, you can use that same extra-space method to sort an arbitrary number of 32-bit integers (say with each only having 255 or less duplicates) in O(n) time, provided you can allocate about four billion bytes for storing the counts.
Simply initialise all the counts to zero and run through each position in your list, incrementing the count based on the number at that position. That's O(n).
Then start at the beginning of the list and run through the count array, placing that many of the correct value in the list. That's O(1), with the 1 being about four billion of course but still constant time :-)
That's also O(1) space complexity but a very big "1". Typically trade-offs aren't quite that severe.
相关问答
更多-
因为空间复杂性代表了输入之外的额外空间。 通常,复杂性与图灵机相关。 算法所采用的空间是运行所需的额外单元数。 输入单元不被考虑在内,并且可以由算法重用以减少额外的存储。 Because space complexity represents the extra space it takes besides the input. Complexity, in general, is defined related to turing machines. The space an algorithm take ...
-
时间复杂性混乱(Time Complexity confusion)[2024-03-21]
你可以用任何语言在O(n)中做到这一点,基本上如下: # Get min and max values O(n). min = oldList[0] max = oldList[0] for i = 1 to oldList.size() - 1: if oldList[i] < min: min = oldList[i] if oldList[i] > max: max = oldList[i] # Initialise boolean list O(n ... -
合并排序复杂性混乱(Merge Sort Complexity Confusion)[2022-08-17]
函数调用的级别被认为是这样的(在[算法的介绍]一书中( https://mitpress.mit.edu/books/introduction-algorithms Chapter 2.3.2): 我们建议如下设置T(n)的递归,这是n个数字上合并排序的最坏情况运行时间。 仅对一个元素进行合并排序需要恒定的时间。 当我们有n> 1个元素时,我们按如下方式分解运行时间。 除法:除法步骤只计算子阵列的中间,这需要恒定的时间。 因此,D(n)=Θ(1)。 征服:我们递归地求解两个子问题,每个子问题的大小为n / ... -
这2个程序的时间复杂性(Time complexity of these 2 program)[2022-07-21]
事实上,两者都具有相同的复杂性 - O(n^3) 。 这是因为你使用+=来连接答案! 那里有一个你没有考虑到的隐藏循环,以及画家算法Schlemiel的经典例子。 你应该使用StringBuilder来代替它,这是正确的方式来建立一个字符串,你去。 In fact, both have the same complexity - O(n^3). That's because you're using += for concatenating the answer! there's a hidden loop ... -
算法复杂度混乱(algorithm complexity confusion)[2022-04-19]
首先,第一个循环的复杂性是O(n *) 。 其次,假设n是算法的输入参数。 如果n是事先已知的常数,那么你的估计O( )是正确的。 First of all the complexity of the first cycle is O(n * ). Second this is assuming n is an input parameter to your algorithm. If n ... -
'如果不是'时间的复杂性('if not in' time complexity)[2022-12-03]
在问题中你称它们是iterable所以我假设它们没有set或类似,并且要确定x not inother_iterable是否为真,你必须一次检查other_iterable的值。 例如,如果它们是列表或生成器,则会出现这种情况。 时间复杂性是最坏的情况; 这是一个上限 。 因此,在这种情况下,最糟糕的情况是在iterable中的所有内容都在other_iterable但是返回的是最后一个项目。 然后,对于iterable中的n项中的每一个,您将检查other_iterable每个m项,并且操作的总数将为O( ... -
BST的时间复杂性(Time Complexity of BST)[2023-10-09]
平均时间复杂度将为O(log n),最差情况为O(n)。 要了解O(log n)复杂度,您可以参考O(log n)究竟是什么意思? 此图片将向您解释如何部分: 我还建议您浏览维基以获取详细信息。 The average case time complexity will be O(log n) and the worst case would be O(n). To understand the O(log n) complexity you can refer What does O(log n) mea ... -
工会的时间复杂性(Time complexity of union)[2022-09-15]
因此,算法的时间复杂度等于线(3-9)+ O(E)的时间复杂度。 工会的时间复杂性是多少? 不,它不是联合的复杂性,如果你使用哈希表,联合可以非常有效地完成。 此外,由于您仅将S用于联合,因此似乎是多余的。 算法的复杂性在很大程度上还取决于你的EXTRACT-MAX(Q)函数(通常它是Q的大小是对数的,所以每次迭代的logV),以及RELAX(u,v,r) (通常也是如此)因为你需要更新优先级队列中的条目,所以Q的大小是对数的。 正如预期的那样,这使我们得到原始Dijkstra算法的复杂性,即O(E+Vlo ... -
你误会了。 你有一个O(n)循环。 生成器函数上的循环不是嵌套循环,它只是在生成时从生成器接收每个项目。 换句话说, for factor in calc_factor(100)循环中的for factor in calc_factor(100) 直接与 yield k表达式相关联; 每次执行for factor in calc_factor(100)循环中的for factor in calc_factor(100)进一步。 对于每个执行的yield k表达式,您将获得1个factor值。 yield k ...
-
混乱的时间复杂性(Confusion in time complexity)[2023-09-18]
如果n和c是正数,那么第二个for循环将不会执行。 在我看来,那些for循环在该链接中写得不正确。 In the event that n and c are positive numbers, then yes the second for loop won't execute. It appears to me that those for loops were written incorrectly in that link.