Changing Search Results With hook_nodeapi

Sometimes, you have a need to really customize your search results. For this project there was a very specific 'index this, not that' sort of requirement. But the site was built already and while I'm still not sure of another way to get these requirements met, there was no way I was going to rebuild anything.

The 'index this, not that' attitude came from one of the business leads testing the Drupal Search functionality / results. What he found was that while you could clearly read something on a page, that doesn't mean that it was going to be indexed when Drupal ran it's cron.php script (which handles indexing). Of course, being the developer, and trying to hide my Drupal 'newness', I felt the hammer drop pretty hard... being charged with completing the "Index this, Not that" challenge.

The "Not That" part was simply small nodes that only provided relevancy to other nodes - ie nodes referenced by CCK. For instance, a publisher. Imagine that this node would have not just the name of the publisher but also contact information or important / private dates right there in the node. The company wouldn't want this published... so it wasn't. In addition, it was made Unsearchable within CCK.

The "Index This" part pretty much meant any page visible on the site. All content should show up in search results if it's visible on a page. Period. End of story. However, many nodes visible on the site had references to other nodes... Especially the nodes I mentioned above that are not published nor searchable.

What I learned was that Drupal doesn't index referential data on a node. While being totally disappointed by this, I started understanding (at least one reason) why.... What if there was further node references in the originally referenced node? then we'd have the problem of loading the new referenced nodes too... completely destroying the performance of any site.

Regardless of the reason, I still needed to get this stuff indexed. Enter hook_nodeapi:
http://api.drupal.org/api/function/hook_nodeapi/
In my opinion, this is the most useful hook available in Drupal. It gives you the power to do ANYTHING you can imagine whenever something is operating on a node. When you find a limit to the power of how this hook is described and used, definitely let me know.

For this scenario, the operation ($op) was "update index". Which basically lets you interrupt cron whenever it's updating the search index and add some text to the indexed subject BEFORE THE PROCESS EVEN BEGINS. So what you do in this situation is use node_load() (http://api.drupal.org/api/function/node_load) to extract the information you WANT to index (perhaps just the title) and put it in a string which you then return. nodeapi uses that string in addition to the node body or other fields to build the index.

Code:


function modulename_nodeapi (&$node, $op, $a3 = NULL, $a4 = NULL) {
  $node_to_load = null;
  if ($op == "update index") {
    if (!empty($node->field_referenced_field[0])) {
      $node_to_load = $node->field_referenced_field[0];
    }
    
    if (!empty($node_to_load)) {
      $loaded_node = node_load($node_to_load);
      $added_body = "";
      $added_body .= $loaded_node->title;
      $added_body .= empty($loaded_node->field_teaser_short[0]["value"]) ? "" : $loaded_node->field_teaser_short[0]["value"];
    }
  }
  return $added_body;
}

So this code simply takes every node and checks for this field before it's indexed. If the field is present and not empty, it forces Drupal "Search" to index the title and short teaser fields of the referenced node right along with the original / parent node.

As reminded by an Anonymous Commenter: The search index should be rebuilt for existing content. This will be necessary when this code is first put in place.