首页 \ 问答 \ 从c＃中的搜索引擎获取链接(get links from search engines in c#)

从c＃中的搜索引擎获取链接(get links from search engines in c#)

 首先请原谅我的英语破碎 
 我想首先编码元搜索引擎我尝试使用谷歌bing和雅虎api s但他们是有限的 
 然后我试图使用htmlagility包获得搜索引擎的结果链接 
 我有这个代码  
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}
 
 }  
 我可以将此代码用于所有搜索引擎吗？ 我更改了这些行，因此它适用于其他搜索引擎  
if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
 
 和 
 
string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();
 
 但它很有效 
 
 我的另一个问题是这段代码只返回第一页链接。如果我想返回N第一个链接，该怎么办？ 
 有人可以帮忙吗？ 

first of all excuse me for my broken english
 i want to code a metasearch engine first i try to use google bing and yahoo api s but theye were limited
 then i'm trying to use htmlagility pack to gain results link of search engines
 i have this code 
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}
 
} 
can i use this code for all search engines? i changed these lines so it work for other search engines  
if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
 
and 
 
string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();
 
but it dosent work
 
My other problem is that this code just return the first page links .what should i do if i want to return N first link? 
 anybody can help?

原文：https://stackoverflow.com/questions/37250808

更新时间：2023-03-01 17:03

最满意答案

 你可以用任何语言在O（n）中做到这一点，基本上如下：  
# Get min and max values O(n).

min = oldList[0]
max = oldList[0]
for i = 1 to oldList.size() - 1:
    if oldList[i] < min:
        min = oldList[i]
    if oldList[i] > max:
        max = oldList[i]

# Initialise boolean list O(n)

isInList = new boolean[max - min + 1]
for i = min to max:
    isInList[i] = false

# Change booleans for values in old list O(n)

for i = 0 to oldList.size() - 1:
    isInList[oldList[i] - min] = true

# Create new list from booleans O(n) (or O(1) based on integer range).

newList = []
for i = min to max:
    if isInList[i - min]:
        newList.append (i)
 
 我在这里假设append是一个O（1）操作，除非实现者脑死亡，否则它应该是这样的。 所以每个O（n）有k步，你仍然有一个O（n）操作。  
 无论这些步骤是否在您的代码中明确完成，或者它们是否在某种语言的封面下完成，都是无关紧要的。 否则，你可以声称C qsort是一个操作，你现在有一个O（1）排序例程的圣杯:-)  
 正如许多人发现的那样，通常可以通过时间复杂性来折衷空间复杂性。 例如，上述仅适用于我们允许引入isInList和newList变量。 如果这是不允许的，下一个最好的解决方案可能是对列表进行排序（可能不是更好的O（n log n）），然后是O（n）（我认为）操作来删除重复项。  
 一个极端的例子是，您可以使用相同的额外空间方法在O（n）时间内对任意数量的32位整数（假设每个只有255个或更少的副本）进行排序， 前提是您可以分配大约40亿个字节存储计数 。  
 只需将所有计数初始化为零并遍历列表中的每个位置，然后根据该位置处的数字递增计数。 那是O（n）。  
 然后从列表的开始处开始并遍历count数组，将许多正确的值放入列表中。 那是O（1），当然1的数量大约是40亿，但仍然是恒定的时间:-)  
 这也是O（1）空间复杂性，但是非常大的“1”。 通常情况下，权衡并不那么严重。 

You can do this in O(n) in any language, basically as: 
# Get min and max values O(n).

min = oldList[0]
max = oldList[0]
for i = 1 to oldList.size() - 1:
    if oldList[i] < min:
        min = oldList[i]
    if oldList[i] > max:
        max = oldList[i]

# Initialise boolean list O(n)

isInList = new boolean[max - min + 1]
for i = min to max:
    isInList[i] = false

# Change booleans for values in old list O(n)

for i = 0 to oldList.size() - 1:
    isInList[oldList[i] - min] = true

# Create new list from booleans O(n) (or O(1) based on integer range).

newList = []
for i = min to max:
    if isInList[i - min]:
        newList.append (i)
 
I'm assuming here that append is an O(1) operation, which it should be unless the implementer was brain-dead. So with k steps each O(n), you still have an O(n) operation. 
Whether the steps are explicitly done in your code or whether they're done under the covers of a language is irrelevant. Otherwise you could claim that the C qsort was one operation and you now have the holy grail of an O(1) sort routine :-) 
As many people have discovered, you can often trade off space complexity for time complexity. For example, the above only works because we're allowed to introduce the isInList and newList variables. If this were not allowed, the next best solution may be sorting the list (probably no better the O(n log n)) followed by an O(n) (I think) operation to remove the duplicates. 
An extreme example, you can use that same extra-space method to sort an arbitrary number of 32-bit integers (say with each only having 255 or less duplicates) in O(n) time, provided you can allocate about four billion bytes for storing the counts. 
Simply initialise all the counts to zero and run through each position in your list, incrementing the count based on the number at that position. That's O(n). 
Then start at the beginning of the list and run through the count array, placing that many of the correct value in the list. That's O(1), with the 1 being about four billion of course but still constant time :-) 
That's also O(1) space complexity but a very big "1". Typically trade-offs aren't quite that severe.

从c＃中的搜索引擎获取链接(get links from search engines in c#)

最满意答案

相关问答

关于空间复杂性的一般混淆(General confusion about space complexity)[2023-05-08]

时间复杂性混乱(Time Complexity confusion)[2024-03-21]

合并排序复杂性混乱(Merge Sort Complexity Confusion)[2022-08-17]

这2个程序的时间复杂性(Time complexity of these 2 program)[2022-07-21]

算法复杂度混乱(algorithm complexity confusion)[2022-04-19]

'如果不是'时间的复杂性('if not in' time complexity)[2022-12-03]

BST的时间复杂性(Time Complexity of BST)[2023-10-09]

工会的时间复杂性(Time complexity of union)[2022-09-15]

python生成器时间复杂度混乱(python generators time complexity confusion)[2022-05-18]

混乱的时间复杂性(Confusion in time complexity)[2023-09-18]

相关文章

最新问答