Use node_load and node_save to Import Data

So you have a CSV file that as a developer, you already know how to manipulate and read from. You also know how to create objects or arrays from this CSV file. But you're running Drupal 6 and you want the content in this file to be content on the site. Most dev guys will tell you that you need to write custom queries to extend the Drupal Schema and then write custom code to get it on the page.

Don't forget... BYODD: Backup Your Own Database Dummy! We're Not responsible for you screwing up your database if you follow these directions.

The fact is, you want to be able to use this data with the popular Drupal modules like Views... and you know that custom queries and schema changes will not get you the nodes you need for this data to actually be useful.

I'm going to tell you how to import your CSV as CCK field data so that you can do things like make custom lists in Views and the like. Meanwhile, eliminating the need for you to architect a table or other schema changes AND eliminating the need to figure out what data structure you need once you start reading the CSV into memory.

  1. Create the content type in Drupal with all the fields you want represented from the CSV.
  2. Build one node, using Drupal, using real or dummy data, that can be published or not.
    This takes care of the schema.
  3. Take note of this node's ID - for use in your import script.
  4. Run a $a_node = node_load($node_id) and inspect using print_r() or var_dump()
    This takes care of the data structure.
  5. VERY IMPORTANT: execute unset($a_node->nid) . This forces node_save to assign a new Node ID for the new data.
  6. Now start your fopen on the CSV file and iterating through the file.
  7. For each read from the file, replace the data in memory with the set from the file... being careful to ensure data there is empty where necessary.
  8. after each iteration, execute a node_save($a_node)
  9. THAT'S IT!! You now have a single node for each line in the CSV file.

SOME USEFUL QUERIES that helped during this project:
Get the last node Id for the content type "story":
select max(nid) from node where type = "story";

If for some reason, you need to delete the nodes and start over, restore your database... or write a script that gets all the new node IDs and then deletes them by executing node_delete($node_id).