Download all original images from a Google images results page

  • 2 Nov 2012
  • PHP, cURL

Methos 1

Usign DownThemAll! a Firefox plugin.

Method 2

Usign some basic PHP programming skills.

The idea is as follows:

  1. Visit google from an old Internet Explorer 6 (don't panic, using headers!). Thus, google show the results with the legacy pager, without ajax.

  2. Get the "imgurl" from each href.

  3. Find the link to the next page if it exists

  4. Get the images

Visit google from IE 6:

<?php
/** 
 * @param $url
 *   Teh url of a web page
 *
 * @param $header
 *   (boolean): If get headers or not
 *
 * @param $binary
 *   (boolean): Tells Curl if get a binary or not
 *
 * @return
 *   The html of $url web page 
 *
 */
function get_content ($url, $header, $binary) {
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL,$url);
  curl_setopt($ch, CURLOPT_HEADER, $header);

  curl_setopt($ch, CURLOPT_TIMEOUT,1000);

  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, $binary);

  $r = curl_exec($ch);
  curl_close($ch);
  return $r;
}

$agent = array('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5');

Get the "imgurl" and the next page, if exists.

/**
 * @param $url
 *   A string containing the url
 * 
 * @param $pics
 *   The array to populate with urls
 * 
 * @param $limit
 *   Number of pages to get
 *
 * @return
 *   Populate pics array with urls
 */
function get_images_url ($url, &amp;$pics, &amp;$limit) {
  global $domain, $agent, $images, $num;

  $search_result = get_content ($url, $agent, 0);
  $array = explode('<a> $line) {

    if (preg_match('@imgurl=([a-zA-Z://.\-=?#0-9_]*)@',$line,$match)) {
      $pics[] = $match[1];
    }
    if (strpos($line,"Avanti</span>")) {
      preg_match('@href=\"([a-zA-Z://.\-=?#0-9_&amp;;,]*)@',$line,$next);
      $next_url = str_replace("&amp;","&amp;",$next[1]);
    }

  }
  $limit--;
  if (isset($next_url) &amp;&amp; strlen($next_url) &gt; 6 &amp;&amp; abs($limit) &gt;= 1) {
    
    get_images_url ($domain.$next_url, $pics, $limit);
  } else {
    get_images_files ($pics);
  }
  
}

Get the images

/** 
 * This function get the image and save it
 * 
 * @param $urls
 *  An array containing the urls from the images files
 *
 */
function get_images_files ($urls) {
  global $dir;

  foreach ($urls as $key =&gt; $url) {
    /* get filename */
    $url_filename = strrpos($url, "/");
    $url_arr = explode("/", $url);
    $filename = $url_arr[count($url_arr)-1];
    if (!file_exists($dir."/".$filename)) {

      /* get image */
      $raw = get_content ($url, 0, 1);
      
      /* create file */
      $fp = fopen($dir."/".$filename, "w+");
      fwrite($fp, $raw);
      fclose($fp);
       
      /* check if the image is square and at least 256px */
      $img_size = getimagesize($dir."/".$filename);
      if ($img_size[0] &lt; 256) {  
        unlink($dir."/".$filename);
      }

    } else {
      print "File {$dir}/{$filename} already exists.\n";
    }
  }
}

This function call get_content but with 0 on the second parameter to get the content without headers. Then, check the image size. In this example, I want no picture smaller than 256px. Yes, hardcoded.