首页 \ 问答 \ 从c#中的搜索引擎获取链接(get links from search engines in c#)

从c#中的搜索引擎获取链接(get links from search engines in c#)

首先请原谅我的英语破碎
我想首先编码元搜索引擎我尝试使用谷歌bing和雅虎api s但他们是有限的
然后我试图使用htmlagility包获得搜索引擎的结果链接
我有这个代码

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}

}

我可以将此代码用于所有搜索引擎吗? 我更改了这些行,因此它适用于其他搜索引擎

if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))


string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();

但它很有效

我的另一个问题是这段代码只返回第一页链接。如果我想返回N第一个链接,该怎么办?
有人可以帮忙吗?


first of all excuse me for my broken english
i want to code a metasearch engine first i try to use google bing and yahoo api s but theye were limited
then i'm trying to use htmlagility pack to gain results link of search engines
i have this code

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}

}

can i use this code for all search engines? i changed these lines so it work for other search engines

if (!hrefValue.ToString().ToUpper().Contains("YAHOO") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))

and

string SearchResults = "http://yahoo.com/search?q=" + textBox1.Text.Trim();

but it dosent work

My other problem is that this code just return the first page links .what should i do if i want to return N first link?
anybody can help?


原文:https://stackoverflow.com/questions/37250808
更新时间:2023-03-01 17:03

最满意答案

你可以用任何语言在O(n)中做到这一点,基本上如下:

# Get min and max values O(n).

min = oldList[0]
max = oldList[0]
for i = 1 to oldList.size() - 1:
    if oldList[i] < min:
        min = oldList[i]
    if oldList[i] > max:
        max = oldList[i]

# Initialise boolean list O(n)

isInList = new boolean[max - min + 1]
for i = min to max:
    isInList[i] = false

# Change booleans for values in old list O(n)

for i = 0 to oldList.size() - 1:
    isInList[oldList[i] - min] = true

# Create new list from booleans O(n) (or O(1) based on integer range).

newList = []
for i = min to max:
    if isInList[i - min]:
        newList.append (i)

我在这里假设append是一个O(1)操作,除非实现者脑死亡,否则它应该是这样的。 所以每个O(n)有k步,你仍然有一个O(n)操作。

无论这些步骤是否在您的代码中明确完成,或者它们是否在某种语言的封面下完成,都是无关紧要的。 否则,你可以声称C qsort是一个操作,你现在有一个O(1)排序例程的圣杯:-)

正如许多人发现的那样,通常可以通过时间复杂性来折衷空间复杂性。 例如,上述仅适用于我们允许引入isInListnewList变量。 如果这是不允许的,下一个最好的解决方案可能是对列表进行排序(可能不是更好的O(n log n)),然后是O(n)(我认为)操作来删除重复项。

一个极端的例子是,您可以使用相同的额外空间方法在O(n)时间内对任意数量的32位整数(假设每个只有255个或更少的副本)进行排序, 前提是您可以分配大约40亿个字节存储计数

只需将所有计数初始化为零并遍历列表中的每个位置,然后根据该位置处的数字递增计数。 那是O(n)。

然后从列表的开始处开始并遍历count数组,将许多正确的值放入列表中。 那是O(1),当然1的数量大约是40亿,但仍然是恒定的时间:-)

这也是O(1)空间复杂性,但是非常大的“1”。 通常情况下,权衡并不那么严重。


You can do this in O(n) in any language, basically as:

# Get min and max values O(n).

min = oldList[0]
max = oldList[0]
for i = 1 to oldList.size() - 1:
    if oldList[i] < min:
        min = oldList[i]
    if oldList[i] > max:
        max = oldList[i]

# Initialise boolean list O(n)

isInList = new boolean[max - min + 1]
for i = min to max:
    isInList[i] = false

# Change booleans for values in old list O(n)

for i = 0 to oldList.size() - 1:
    isInList[oldList[i] - min] = true

# Create new list from booleans O(n) (or O(1) based on integer range).

newList = []
for i = min to max:
    if isInList[i - min]:
        newList.append (i)

I'm assuming here that append is an O(1) operation, which it should be unless the implementer was brain-dead. So with k steps each O(n), you still have an O(n) operation.

Whether the steps are explicitly done in your code or whether they're done under the covers of a language is irrelevant. Otherwise you could claim that the C qsort was one operation and you now have the holy grail of an O(1) sort routine :-)

As many people have discovered, you can often trade off space complexity for time complexity. For example, the above only works because we're allowed to introduce the isInList and newList variables. If this were not allowed, the next best solution may be sorting the list (probably no better the O(n log n)) followed by an O(n) (I think) operation to remove the duplicates.

An extreme example, you can use that same extra-space method to sort an arbitrary number of 32-bit integers (say with each only having 255 or less duplicates) in O(n) time, provided you can allocate about four billion bytes for storing the counts.

Simply initialise all the counts to zero and run through each position in your list, incrementing the count based on the number at that position. That's O(n).

Then start at the beginning of the list and run through the count array, placing that many of the correct value in the list. That's O(1), with the 1 being about four billion of course but still constant time :-)

That's also O(1) space complexity but a very big "1". Typically trade-offs aren't quite that severe.

相关问答

更多
  • 因为空间复杂性代表了输入之外的额外空间。 通常,复杂性与图灵机相关。 算法所采用的空间是运行所需的额外单元数。 输入单元不被考虑在内,并且可以由算法重用以减少额外的存储。 Because space complexity represents the extra space it takes besides the input. Complexity, in general, is defined related to turing machines. The space an algorithm take ...
  • 你可以用任何语言在O(n)中做到这一点,基本上如下: # Get min and max values O(n). min = oldList[0] max = oldList[0] for i = 1 to oldList.size() - 1: if oldList[i] < min: min = oldList[i] if oldList[i] > max: max = oldList[i] # Initialise boolean list O(n ...
  • 函数调用的级别被认为是这样的(在[算法的介绍]一书中( https://mitpress.mit.edu/books/introduction-algorithms Chapter 2.3.2): 我们建议如下设置T(n)的递归,这是n个数字上合并排序的最坏情况运行时间。 仅对一个元素进行合并排序需要恒定的时间。 当我们有n> 1个元素时,我们按如下方式分解运行时间。 除法:除法步骤只计算子阵列的中间,这需要恒定的时间。 因此,D(n)=Θ(1)。 征服:我们递归地求解两个子问题,每个子问题的大小为n / ...
  • 事实上,两者都具有相同的复杂性 - O(n^3) 。 这是因为你使用+=来连接答案! 那里有一个你没有考虑到的隐藏循环,以及画家算法Schlemiel的经典例子。 你应该使用StringBuilder来代替它,这是正确的方式来建立一个字符串,你去。 In fact, both have the same complexity - O(n^3). That's because you're using += for concatenating the answer! there's a hidden loop ...
  • 首先,第一个循环的复杂性是O(n * ) 。 其次,假设n是算法的输入参数。 如果n是事先已知的常数,那么你的估计O()是正确的。 First of all the complexity of the first cycle is O(n * ). Second this is assuming n is an input parameter to your algorithm. If n ...
  • 在问题中你称它们是iterable所以我假设它们没有set或类似,并且要确定x not inother_iterable是否为真,你必须一次检查other_iterable的值。 例如,如果它们是列表或生成器,则会出现这种情况。 时间复杂性是最坏的情况; 这是一个上限 。 因此,在这种情况下,最糟糕的情况是在iterable中的所有内容都在other_iterable但是返回的是最后一个项目。 然后,对于iterable中的n项中的每一个,您将检查other_iterable每个m项,并且操作的总数将为O( ...
  • 平均时间复杂度将为O(log n),最差情况为O(n)。 要了解O(log n)复杂度,您可以参考O(log n)究竟是什么意思? 此图片将向您解释如何部分: 我还建议您浏览维基以获取详细信息。 The average case time complexity will be O(log n) and the worst case would be O(n). To understand the O(log n) complexity you can refer What does O(log n) mea ...
  • 因此,算法的时间复杂度等于线(3-9)+ O(E)的时间复杂度。 工会的时间复杂性是多少? 不,它不是联合的复杂性,如果你使用哈希表,联合可以非常有效地完成。 此外,由于您仅将S用于联合,因此似乎是多余的。 算法的复杂性在很大程度上还取决于你的EXTRACT-MAX(Q)函数(通常它是Q的大小是对数的,所以每次迭代的logV),以及RELAX(u,v,r) (通常也是如此)因为你需要更新优先级队列中的条目,所以Q的大小是对数的。 正如预期的那样,这使我们得到原始Dijkstra算法的复杂性,即O(E+Vlo ...
  • 你误会了。 你有一个O(n)循环。 生成器函数上的循环不是嵌套循环,它只是在生成时从生成器接收每个项目。 换句话说, for factor in calc_factor(100)循环中的for factor in calc_factor(100) 直接与 yield k表达式相关联; 每次执行for factor in calc_factor(100)循环中的for factor in calc_factor(100)进一步。 对于每个执行的yield k表达式,您将获得1个factor值。 yield k ...
  • 如果n和c是正数,那么第二个for循环将不会执行。 在我看来,那些for循环在该链接中写得不正确。 In the event that n and c are positive numbers, then yes the second for loop won't execute. It appears to me that those for loops were written incorrectly in that link.

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。