Troubleshooting Spreadsheet Ingests

Last Updated, January 9, 2024.

 

This page includes information on commonly encountered issues when ingesting a spreadsheet and how to resolve these errors. In Modern Islandora there is a significant improvement in the detail provided for errors that you may encounter during spreadsheet ingest as it often provides the field name and/or even row number to help you identify where the issue was encountered during the ingest process. When there are significant errors, the system will either not ingest the content at all or will ingest some of the metadata record information with the status of those items being unpublished.

Have an error you can't resolve or need a second set of eyes to help determine what the issue is? Message Michael Howser on the CTDA slack channel and provide the link to the migration group and he will look into what might be causing the error. 

 

 

How Do I Review Spreadsheet Ingests for Errors?

Once you have completed your spreadsheet ingest, click on View and the name of your Migration group. On the following page, click on the messages number to view any messages. The key message statuses you are looking for are Failed as these will not have been ingested. Included below are examples of messages that you may encounter and what these common messages mean.

 

 

Dates

In Modern Islandora, the dates are in EDTF format and validated so you may encounter some messages from your ingests if the system receives a date which is not formatted correctly or is populated to the wrong field. 

In the following example, for Create nodes from rows in the csv migration for ID2 we received the following message: 

explode: DateTime::__set_state(array( 'date' => '1899-12-30 00:00:43.000000', 'timezone_type' => 3, 'timezone' => 'America/New_York', )) is not a string

 

This message indicates an issue with ID2 in our spreadsheet ingest with a significant issue with DateTime. Let's review the spreadsheet to determine what the issue may be causing this failed message.

explode_DateTime_spreadsheet_data_entry

As we can see in the column origin_information, ID2 cell Y3 in the above example includes the data in the correct EDTF format but the date is not populated into the correct subfield within origin_information. As we want to populate this as the date created value, the entry in cell Y3 should be ||2020-08-12. The spreadsheet instead was trying to populate this date value into the event_type field which is not a date field, hence the explode error.

 

 

Digital File

The path to the object you want to ingest in the CTDA sandbox is prepended with a the path /opt/ingest_data/csv_image_upload/ followed by the name of the file. This path name is only valid if the file has already been uploaded via SFTP by the CTDA on your behalf in the sandbox environment. If you have files you want to test ingesting via spreadsheet, just slack Michael Howser to have these uploaded so you can test.

In the following example in Messages of Create files from rows in the csv we have two messages for ID2 which are as follows "Destination Full Path: repo-bin://2023-02/095A8516.mp4" and "file_copy: File '/opt/ingest_data/csv_image_upload/095A8516.mp4' does not exist" which both result in a Failed status.

pathname_incorrect_digital_file

Let's look at the spreadsheet to review these messages. For ID2 in the digital_file column (cell I3) we see a file path populated which looks correct at first glance.

pathname_incorrect_digital_file_spreadsheet

The file path is correct, but in this case the actual file name is incorrect as the extension .mp4 on the original file I was uploaded was actually .MP4. The file extensions as well as the file names and paths are case sensitive so it's a good idea to double check these are correct if you see messages like the ones illustrated above when ingesting content.

 

 

Explode

In Modern Islandora, the system provides strict validation of fields and the type of information expected for a field. If there is a text value entered in a field where the system expects a number, or a number not formatted as the system expects, the explode error message will appear in the create nodes from rows in the CSV error message for your spreadsheet ingest. 

In the following example, in the Create nodes from rows in the csv migration ID1 and ID2 have failed status messages which indicate explode: 10426925 is not a string

create nodes from rows in CSV message

To troubleshoot this, let's look for this number in the spreadsheet that was ingested. As we can see when viewing the spreadsheet in Microsoft Excel, there is a green triangle in the top left corner of this cell, indicating we have a number entered in the field, and if we hover over the cell, a yellow exclamation alert will appear. 

oclc_number_stored_as_text_indicator_excel

If we click on the yellow exclamation mark, we can change this field to a number by selecting the Convert to Number option. If you have multiple cells with this same issue in the column, highlight the cells to update and then click the yellow exclamation mark to update all the text to numbers in these cells.oclc_number spreadsheet number stored as text menu

Now you can re-upload and re-ingest the spreadsheet and this issue will be resolved.

---

Another example of an explode message you may encounter is a subdelimited_explode message in the Create nodes from rows in the csv migration ID1 has a failed status messages which indicates subdelimited_explode: The value provided to the subdelimited_explode must be a string; actual provided value: DateTime::.. which provides a clue that in a field there is a value that is entered is not what is expected for the field. 

create nodes from rows in CSV date not delimited as expected

Let's take a look at the spreadsheet to see what may have caused this message. When looking at the spreadsheet in the origin_information column for ID1 we discover that the data value is listed and looks to be in EDTF format, but this field has multiple subfields this message clearly shows the information in this field is not populating to the correct field.

exploded subfield error date spreadsheet

Upon closer examination we can see that the date created field value in the spreadsheet is populating to the first subdelimited field for this column, which is actually event type which would not be a date value. To resolve this issue, we will need to use the pipe | subdelimiter to populate this value to the data created subfield, The correct entry for the spreadsheet would be ||2021-06-11 to add the date created to the ingest.

origin_information subdelimiter field in spreadsheet

Now we can save and re-ingest the spreadsheet and this message will be resolved.

 

 

 

Model & Bundle

In Modern Islandora, the system strictly validates many more fields than in the Islandora 7.x and this section provides some common errors you may encounter related to model and 'bundle' related issues.

In the following example, the message for the migration Create image media from rows in the csv. indicates as follows: No static mapping found for ''Paged Content'' and no default value provided for destination 'bundle'

no_static_mapping_found_paged_content_message

When reviewing the ingest spreadsheet, the message indicates that the issue is with ID 1 (cell A2) we see the model was entered as Paged Content which is correct but I accidentally pasted a value for a digital_file which should be empty for Paged Content.

Model_bundle_spreadsheet_issue

To fix this, the digital_file cell I2 in the example above for the Paged Content ensure this cell remains blank.

If you receive a message related to 'bundle' take a look at the row the message indicates, review the model, and determine if the digital_file is missing or should not be included for that content model.

 

The following message of the Create image media from rows in the csv "No static mapping found for ''Digital Document'' and no default value provided for destination 'bundle'." indicates

create_image_media_from_rows_in_the_csv_document_bundle_message

In this example the issue is the model "Document" is not a valid value for this field but rather it should be "Digital Document" for the model.

 

 

 

Notes

To review errors after ingesting a spreadsheet from the Islandora Spreadsheet Ingest page click View and then click on the Migration Group Name for your ingest. This will display the Migrations page which provides a list of important information and messages related to the spreadsheet you ingested. A good place to start is to look for any messages that may have been created from the ingest process. We will click on the number 4 in this example as this corresponds to the Messages column.

Create Nodes from Rows in the CSV notes error migration page

Next, we will review these messages

Create nodes from rows in the CSV notes error messages page

Based on the messages above, it is indicating the issue is with item ID2 in our spreadsheet and the message indications field_note_paragraph giving us a clue the issue is with the notes field and type note validation indicates the value is not valid or missing for the type note.

When we look at the original spreadsheet we ingested we see the following:

Create nodes from rows in the csv notes error original spreadsheet example

From above we can see there is no indication as to the type note for the information entered. To correct this we need to add one of the valid type note fields with the appropriate delimiter. 

Below is the corrected version of this field in the spreadsheet that once updated we can conduct a new ingest and the messages will no longer appear and the item will ingest.

Create notes from rows in the CSV notes error corrected spreadsheet

 

 

 

 

 

 

Reformatting Quality

Each group, collection (parent), and object within the repository has a node associated with it. The following is an example of an error which indicates an issue with the node number. The message of the Create nodes from rows in the csv states "[node: 504, revision: 1100]: field_reformatting_quality.0=The value you selected is not a valid choice.".

create_nodes_from_rows_in_the_csv_node_error_message

As the message indicates the field_reformatting_quality is not a valid choice, let's look at the spreadsheet to examine this further. As the spreadsheet image indicates below, for ID1 the value entered in the column reformatting_quality cell AC2 the valid value for this field is "access" as the fields are case sensitive. Once this is corrected the error message will be resolved. Valid values for this field are: access preservation replacement

spreadsheet_reformatting_quality_error