Drupal Migrate API: Migrate content from a JSON source

Posted on July 20, 2019

Migrate API

Migration plugins are the glue that binds a source, a destination, and multiple process plugins together to move data from one place to another. Often referred to as migration templates, or simply migrations, migration plugins are YAML files that explain to Drupal where to get the data, how to process it, and where to save the resulting object.

Migrations is an Extract - Transform - Load (ETL) process:

D8 Migrate ETL Process

  • extract phase is called source
  • transform phase is called process
  • load phase is called destination

Contrib module: migrate_plus

Migrate Plus extends the functionality of the Drupal Migrate API, adding new features, including the url source plugin, allowing us to perform migrations from JSON, XML and Soap file formats.

Contrib module: migrate_tools

Migrate Tools adds tools for running and managing migrations. Most importantly, the following drush commands we will be running when using migrations:

  • migrate-status - Lists all migrations and their status.

  • migrate-import - Runs a migration.

  • migrate-rollback - Rollback a migration.

  • migrate-reset-status - Resets a migration status to idle. Used if something goes wrong during the migration-import process, and the status gets stuck on "Importing".

Migration Json Example (from dados.gov.pt):

{"d": [
    {
        "PartitionKey":"mercadosamadora",
        "RowKey":"636541185252326807",
        "Timestamp":"2018-02-13T11:35:26.1849199Z",
        "entityid":"4e72fed3-3c67-45bb-a53ce125e59e36d0",
        "categoria":"Mercados",
        "subcategoria":"Mercados",
        "designacao":"Mercado de Alfragide",
        "morada":"Largo do Movimento das Forças Armadas",
        "freguesia":"Alfragide",
        "latitude":"38.735400753299999",
        "longitude":"-9.217441079000000"
    },
 ...
]}

A JSON Migration is all about configuration

A migration, when using already existing plugins, is nothing more than a .yml file, explaining how the data from the source should reach the destination.

In our example, the source will be a JSON file, and our destination will be an entity (node).

A Content Type was created beforehand, with the following fields that will be filled from the source file:

Node market:

title - Name of the market. Will be the designacao from the source file.

field_coordinates - A Geofield with the latitudes and longitudes from the source file.

field_freguesia - A taxonomy of type freguesia that will match the field with the same name from the source file.

field_id - A unique identifier that will match the entityId from the source file. We will use this as our migration ID.

field_address - Address of the market that will match the field morada from the source file.

And the taxonomy vocabulary Freguesia that will be used for the field_freguesia was also created.

Module migrate_markets:

Migrations needs to be attached to a module, as they will house the config files.

A custom module was created, under the modules/custom folder.

migrate_markets.info.yml

name: 'migrate_markets'
type: module
description: 'Migrates Amadora Markets.'
core: 8.x
package: 'Custom'
dependencies:
  - migrate
  - migrate_plus

And under the config/install folder of the module, we will create our migration file migrate_plus.migration.market.yml.

General identifying information

We can start off by setting the basic information for our migration there:

# The machine name for a migration. Also used by various CLI tools when
# referencing the migration as an argument for a command.
id: market

# A human-friendly description of the migration. This is used by various UI and
# CLI tools when showing a list of available migrations.
label: Migrate Amadora Markets from dados.gov.pt

# Optional group for the migration. This is a feature provided by the
# migrate_plus module and allows for running an entire group of migrations
# together.
migration_group: market

Destination

We can also go ahead and set the destination plugin, since it's very straightforward:

# Every migration must also have a destination plugin, which handles writing
# the migrated data in the appropriate form for that particular kind of data.
# This value should be the ID of the destination plugin to use.
#
# This is the load phase of your migration.
destination:
  plugin: entity:node

This tells the migration that it should place the data into a node. We can set which node type (bundle) that will later, on the process phase.

Source

For the source plugin, we will use the aforementioned url plugin, that comes with the contrib migrate_plus module.

# Every migration must have a source plugin. Set this to the ID of the plugin to
# use.
#
# This is the extract phase of your migration.
source:
  plugin: url
  # Specifies the http fetcher plugin.
  data_fetcher_plugin: http
  # Specifies the JSON parser plugin.
  data_parser_plugin: json
  headers:
    Accept: 'application/json; charset=utf-8'
    Content-Type: 'application/json'
  # One or more URLs from which to fetch the source data.
  urls:
    - https://dados.gov.pt/s/dadosGovFiles/mercadosamadora.json
  # For JSON, item_selector is the xpath used to select
  # our source items.
  item_selector: 'd'
  # For each source field, we specify a selector,
  # the field name which will be used to access the field in the
  # process configuration, and a label to document the meaning
  # of the field in front-ends.
  fields:
    -
      name: title
      label: 'Title'
      selector: /designacao
    -
      name: id
      label: 'ID'
      selector: /entityid
    -
      name: freguesia
      label: 'Freguesia'
      selector: /freguesia
    -
      name: address
      label: 'Address'
      selector: /morada
    -
      name: latitude
      label: 'Latitude'
      selector: /latitude
    -
      name: longitude
      label: 'Longitude'
      selector: /longitude

  # Under ids, we specify which of the source fields retrieved
  # above (id in this case) represent our unique
  # identifier for the item, and the schema type for that field.
  ids:
    id:
      type: string

This is a lot of different entries, so let's go step by step:

Plugin

We will use the url source plugin, located at modules/contrib/migrate_plus/src/Plugin/migrate/source/Url.php

This source plugin will have a fetcher and a parser plugin associated with it.

Data Fetcher

Migrate plus supports these data fetchers, located at modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_fetcher:

  • file File.php
  • http Http.php

Their use should be pretty straightforward. If we have the source file locally in our Drupal site, we would use file. Since we have an endpoint that automatically updates itself on an external URL, we will use the http fetcher instead. Note: This option supports Authorization. In our case, the endpoint is public, so it won't be necessary. The file itself provides an example annotation, that you can use if you want to know all the values it accepts.

Data Parser

As for data parsers, migrate_plus supports the following, located at modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_parser:

  • JSON Json.php
  • SimpleXml SimpleXml.php
  • Soap Soap.php
  • Xml Xml.php

We will use JSON on our examples, so we won't look over at all the other ones.

Headers

The headers property should be self-explanatory, and is used in conjunction with our http data fetcher. For our example, we just set the headers that prepare the request to receive a JSON response.

URLs

Here we set the URL(s) of our source, so the migration knows where to go get it. Also pretty self-explanatory.

Item Selector

This will be an xPath selector of the root where our data is. If you go look at our JSON example above, you will notice that all our data is inside an array of key d. So we tell the migration to go look inside that. If your JSON data is already inside the root of the JSON, you can set this to NULL.

Fields

fields is an array that will sort of serve as our bridge between the JSON fields into variables we can use on our process plugin.

They should have a name, which will let us reference them in the migration file.

The label is mostly used as an identifier for any UI dealing with migrations.

The selector, much like the previous item selector property, is an xPath selector indicating where the field in our source is located.

ID

The id property will match our unique identifier for this migration. This will ensure that, when re-running a migration, items that were already processed once won't be processed again.

An easy way to look at it, is to understand that the source plugin should answer the following questions:

Where is the source coming from? Defined on the urls property.

What am I getting from the source? Defined using the fields property.

How should I get the data from the source? Defined using the all the other properties.

Process

The only step left is configuring our process plugin.

# Here's the meat of the migration - the processing pipeline. This describes how
# each destination field is to be populated based on the source data. For each
# destination field, one or more process plugins may be invoked.
#
# This is the transform phase of your migration.
process:
  # Hardcode the destination node type (bundle) as 'market' using the
  # 'default_value' process plugin.
  type:
    plugin: default_value
    default_value: market

  # Simple field mappings that require no extra processing can use the default
  # 'get' process plugin. This just copies the value from the source to the
  # destination. 'get' is the default when no plugin is defined, so you can just
  # do destination_field: source_field.
  title: title

  field_id: id

  field_freguesia:
    plugin: entity_generate
    source: freguesia
    value_key: name
    bundle: freguesia
    entity_type: taxonomy_term

  field_address: address

  field_coordinates:
    plugin: geofield_latlon
    source:
      - latitude
      - longitude

Here we set how each source field should be processed by Drupal before ending up as fields for our destination node.

Each field will have at least one process plugin associated with it. If you don't explicitly set a plugin, and only do the mapping between Drupal destination field and source field, like our title, for example:

title: title

Then, by default, the get plugin will be used. It is a very basic plugin, that indicates that no processing of the JSON field is needed, and that the field should just be introduced as-is. So, the previous example, in the same as this:

title:
  plugin: get
  source: title

The key should be the destination (in our case, a node) field name, while the value should be the name attribute defined in our source field array.

Some fields will require the use of a process plugin. For example, to define our node type, in this example, we will hardcode the bundle of the node (aka the Content Type machine name), since we know that these nodes must all be of type market.

We can use the default_value plugin for this, which will provide a default value, if no source is provided or if the value from the source returns empty (Useful for fields that might not be set for all the JSON objects, but need to always have a value on our Drupal Entity).

We are using two contrib process plugins, one from the migrate_plus module, called entity_generate, and another one from the geofield module, called geofield_latlon.

The geofield_latlon plugin is very straightforward. it takes in both a latitude and a longitude values (that we set in our source fields), and creates a GeoField with them.

The entity_generate plugin will:

  • Look for a taxonomy_term of type freguesia, and match the source field freguesia to the name of the taxonomy.
  • If a Taxonomy with that name already exists, that will be the one used.
  • If not, one is created with that name.

In our example migration, we won't be using any custom process plugins, but if the plugins available from Core and the Migrate Plus module don't quite fit your needs, you can always easily make your own.

And that's it! Our final migrate_plus.migration.market.yml file looks like this:

id: market
label: 'Migrate Markets from dados.gov'
migration_group: market
destination:
  plugin: 'entity:node'
process:
  type:
    plugin: default_value
    default_value: market
  title: title
  field_id: id
  field_freguesia:
    plugin: entity_generate
    source: freguesia
    value_key: name
    bundle: freguesia
    entity_type: taxonomy_term
  field_address: address
  field_coordinates:
    plugin: geofield_latlon
    source:
      - latitude
      - longitude
source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  headers:
    Accept: 'application/json; charset=utf-8'
    Content-Type: application/json
  urls:
    - 'https://dados.gov.pt/s/dadosGovFiles/mercadosamadora.json'
  item_selector: d
  fields:
    -
      name: title
      label: Title
      selector: /designacao
    -
      name: id
      label: ID
      selector: /entityid
    -
      name: freguesia
      label: Freguesia
      selector: /freguesia
    -
      name: address
      label: Address
      selector: /morada
    -
      name: latitude
      label: Latitude
      selector: /latitude
    -
      name: longitude
      label: Longitude
      selector: /longitude
  ids:
    id:
      type: string

All that's left is to activate our module with drush en migrate_markets, and if everything went right we should already be able to see our migration show up using drush migration-status:

drush migration-status

We can run our migration using drush migration-import market. When dealing with a large number of migration entities, it's recommended to use the limit flag with a small number, like --limit=10 or similar, when running the migration for the first time, to test if everything goes smoothly.

drush migration-import market

Note: I am using Lando as my development environment, so all my console commands will have lando prepended to them. Unless you're using it too, you probably won't need to type that.

And that's it! We just powered out website up with all the Markets from the Amadora region. We could use that data to show a map of all the Markets around Amadora, for example:

Homepage Mercados Amadora

Or to list all the Markets on a specific Freguesia:

Mina de Água Mercados Amadora

Tags: drupal, migrate, json