Migrating data in Drupal 5 to 6 using Table Wizard and Migrate (Part 1 - Taxonomies)

This will be the first in a few posts on how I am trying to move data in a series of Drupal 5 sites to one Drupal 6 site.

Initially, I was trying to migrate the data with Node Export module, as there are versions for Drupal 5 and for 6. It works pretty well and gives you nice export files that get imported by the D6 version with no problem. But, and there's always a "but", there are a couple of problems with this. The main one is that there are multiple taxonomy terms on each node, but the term ids on the D6 site don't match the ones on the D5 site (I had previously exported and imported the taxonomies with the Taxonomy import/export via XML module).

So how to fix this? Two possibilities came to mind: do multiple search and replaces to change the old term ids to the new ones, OR, edit the nodes after import and redo all the taxonomy term selections. The former is a bit less intensive, but both require tons of work. So I looked for another option.

Enter Table Wizard and Migrate modules. I am using the 2010-Mar-21 dev version of TW and the 1.0 version of Migrate (with Schema 6.x-1.7 and Views 6.x-2.10) for this.

The basic process here is to load the data tables you want to import into Table Wizard, and then get the Migrate module to do the actual import.

These modules are theoretically able to import the D5 data directly from the D5 database. All I needed to do was make that database accessible to the Table Wizard by adding a configuration line to settings.php. However, for some reason, this didn't work for me, so I simply added the D5 tables to the same D6 database. Luckily, the D5 tables are all prefixed, so there are no clashes with the D6 tables. In the examples shown below, I use the "b_" prefix for the old tables.

The first things I need to import are the various vocabularies. I need those in place before I import the actual nodes, so that I can map the old node term ids to the new ones.

Go to the Table Wizard page in Content Management and click on Add existing tables. Now select the b_term_data and b_term_hierarchy tables from the D5 site and click Add tables. I left the other settings alone i.e. Skip full analysis = off; Provide default view = on; Default view name = blank.

Now the two tables are shown at the top of the page, and you can see how many rows there are for each (they should match). Clicking on the table name in the first column takes you to a settings page where you can select which fields of the table to ignore, and leave comments about what each field means. I didn't do anything here for the b_term_data table, but I unticked the PK (primary key) checkbox for the parent field. This is because things only happen on tables with ONE primary key.

Now, to ensure that I keep the hierarchy in my taxonomy, I created a Relationship between the data and hierarchy tables. I set Field from the base table of the relationship to b_term_data.tid and Corresponding field from a table to be joined to the base table to b_term_hierarchy.tid. I left the Incorporate related table into views automatically option on Automatic.

Clicking on the the b_term_data link in the second column (on the Tables page), you'll see a view of all the data, including the parent term id in the last column. If you don't, simply edit the View and add b_term_hierarchy: parent as a new field to the view.

Usually, you want to import one vocabulary at a time, so I set a Filter to restrict the view to show only one vid. You could set this as an argument instead of filter, and then specify the vocabulary id in the Migrate module settings as a views argument. If you need to import multiple vocabularies, then this is probably a good option.

Alright, that's all done, so now we can go to the Migrate module page in Content Management.

The first thing to do here is create a new content set. This is where you define the import parameters. Type whatever you want for the name (read the help text here) and description. For the Destination, select Taxonomy Term, and for the Source view, select the Table Wizard b_term_data view (tw: b_term_data (term_data)). If you want to, you can select the Views argument here too. I left the weight as zero.

On the next screen, I left the primary key as tid, weight as zero and separator as a comma. The next set of options allow you to map where each field should go. Set the Name, Description, Weight and Parent options to the appropriate source fields. Leave the others as none. You might think you should set the Existing term ID and Vocabulary, but if you do, then the import will try and use those same values on the new D6 site. Leaving them set to none, allows the terms to get new term ids (and prevents term id clashes with other taxonomy imports). In the default value field for Vocabulary, I entered the vocabulary id of the D6 vocab I wanted to import the terms into.

Alright, time to do the import. Tick the Import checkbox next to the content set you want to import. Open up the Execute fieldset, and click Run. If you only want to import a part of the data, enter the number or data range in the provided boxes. If you want to reverse the import afterwards, tick the Clear box, and then click Run. If all goes as it should, you'll have all your terms in your new vocabulary, with the hierarchy intact.

Now to repeat all this for the other 5 sites....

Note that the Migrate module will create a migrate_map_content_set_name table which maps the old term ids to the new ones. What a clever module! This is EXTREMELY useful for later on.

I'll write more blog posts when I do the user, node and comment migrations (in that order).

Posted in:
magnanimous-junior