首页 文章

以编程方式获取谷歌搜索结果计数的最简单(合法)方式?

提问于
浏览
5

我想使用Java代码获取某些Google搜索引擎查询(在整个网络上)的估算结果计数 .

我每天只需要做很少的查询,所以最初Google Web Search API虽然已经弃用,但看起来还不错(例如How can you search Google Programmatically Java API) . 但事实证明,此API返回的数字与www.google.com返回的数字非常不同(请参阅例如http://code.google.com/p/google-ajax-apis/issues/detail?id=32) . 所以这些数字对我来说都没用 .

我也试过Google Custom Search engine,它表现出同样的问题 .

您认为我的任务最简单的解决方案是什么?

2 回答

  • 1
    /**** @author RAJESH Kharche */
    //open Netbeans
    //Choose Java->prject
    //name it GoogleSearchAPP
    
    package googlesearchapp;
    
    import java.io.*;
    import java.net.*;
    import java.util.*;
    import java.util.logging.Level;
    import java.util.logging.Logger;
    
    public class GoogleSearchAPP {
        public static void main(String[] args) {
            try {
                // TODO code application logic here
    
                final int Result;
    
                Scanner s1=new Scanner(System.in);
                String Str;
                System.out.println("Enter Query to search: ");//get the query to search
                Str=s1.next();
                Result=getResultsCount(Str);
    
                System.out.println("Results:"+ Result);
            } catch (IOException ex) {
                Logger.getLogger(GoogleSearchAPP.class.getName()).log(Level.SEVERE, null, ex);
            }      
        }
    
        private static int getResultsCount(final String query) throws IOException {
            final URL url;
            url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
            final URLConnection connection = url.openConnection();
    
            connection.setConnectTimeout(60000);
            connection.setReadTimeout(60000);
            connection.addRequestProperty("User-Agent", "Google Chrome/36");//put the browser name/version
    
            final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8");  //scanning a buffer from object returned by http request
    
            while(reader.hasNextLine()){   //for each line in buffer
                final String line = reader.nextLine();
    
                if(!line.contains("\"resultStats\">"))//line by line scanning for "resultstats" field because we want to extract number after it
                    continue;
    
                try{        
                    return Integer.parseInt(line.split("\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));//finally extract the number convert from string to integer
                }finally{
                    reader.close();
                }
            }
            reader.close();
            return 0;
        }
    }
    
  • 4

    您可以做的就是以编程方式开始实际的Google搜索 . 最简单的方法是访问url https://www.google.com/search?q=QUERY_HERE,然后你想从该页面中删除结果计数 .

    以下是如何执行此操作的快速示例:

    private static int getResultsCount(final String query) throws IOException {
        final URL url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
        final URLConnection connection = url.openConnection();
        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Mozilla/5.0");
        final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8");
        while(reader.hasNextLine()){
            final String line = reader.nextLine();
            if(!line.contains("<div id=\"resultStats\">"))
                continue;
            try{
                return Integer.parseInt(line.split("<div id=\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));
            }finally{
                reader.close();
            }
        }
        reader.close();
        return 0;
    }
    

    如需使用,您可以执行以下操作:

    final int count = getResultsCount("horses");
    System.out.println("Estimated number of results for horses: " + count);
    

相关问题