Skip to content

Duplicating questionnaires

Samiwel Thomas edited this page Oct 1, 2018 · 2 revisions

Method 1: Maintain reference of original and new entities

This method would allow us to keep all of the existing benefits that our database provides, such as referential integrity but at the expense of additional overhead and complexity in the duplicating process.

Essentially it involves duplicating the rows for each entity whilst maintaining context of the duplicated entities and their new Id values.

The benefit of this approach is that it is methodical and you can reason about it. There's probably some logic that could be re-used at each level to adhere to DRY principles where possible.

The downside of this approach is that if we introduce new entities into the questionnaire model, then the duplication logic would need to be maintained to deal with the new entities. It will also get a little gnarly when we have multiple foreign key relationships to deal with, e.g. in the case of routing. But it is not unachievable - just tedious.

Steps (these are approximate steps and may change during implementation)

  1. Copy the questionnaire

    1. Copy questionnaire
    2. Save the original and the new questionnaire Id.
    const copiedQuestionnaire = {
      originalQuestionnaireId: newQuestionnaireId
    };
  2. Copy the metadata

    1. For each metadata entry for original questionnaire...
    2. Insert a new metadata entry for new questionnaire.
    3. Maintain a reference to the new Id values
    const copiedMetadata = {
      originalMetadataId: newMetadataId
    }
  3. Copy the sections

    1. For each section in original questionnaire...
    2. Copy the section replacing originalQuestionnaireId with newQuestionnaireId.
    3. Save the the original and new section Ids
    const copiedSections = {
      originalSectionId: newSectionId
    };
  4. Copy the pages for each section

    1. For each original section...
    2. For each page...
    3. Copy the page using existing page duplication logic
    4. Maintain a reference to each copied page.
    5. Maintain a reference to each page containing a piped value
    const copiedPages = {
      originalPageId: newPageId
    }
    const pagesContainingPipedValues = {
      [pageId]: ["title", "guidance"]
    }
  5. Copy the answers for each page

    1. Existing logic to duplicate answers
    2. Maintain a reference to each copied answer.
    const copiedAnswers = {
      originalAnswerId: newAnswerId  
    }
  6. Update piped values

    An additional pass is required to update piped values. Currently we only support piping values into the title, description and guidance fields of question pages.

    1. For each page containing a piped value (see 4.5)
    2. Pick the title, description and guidance
    3. Update any piped answer Id / metadata Id by looking up the new Id captured by 2.3
    forEach(pagesContainingPipedvalues, (fieldsToUpdate, pageId => {
     const pageModel = getPage(pageId);
     
     const ctx = {
       answers: copiedAnswers,
       metadata: copiedMetadata
     }
    
     updatePage(replacePipedValuesWithNewIds(pageModel, ctx));
    });

Method 2: Locally scoped Ids

This method would involve having locally scoped Ids for entities within a questionnaire.

For example, imagine we have two questionnaires. At present the sections might look something like this.

sections

id title questionnaireId
1 Section A 1
2 Section A 2

If we were to copy the section from questionnaire 1 at present, you'd end up with something like this...

id title questionnaireId
1 Section A 1
2 Section A 2
3 Section A 1

A new section has been created with Id 3. This is problematic because now every page below this section needs to know about the newly created section Id and have their values updated.

One approach to combating this complexity would be to move to using locally scoped Ids.

id scoped_id title questionnaireId
1 1 Section A 1
2 1 Section A 2

This way we can have two locally scoped Ids with the same id value as can be seen above. Both the section in questionnaire 1 and questionnaire 2 have the same scoped Id value of 1.

Since sections and pages are accessed via a clearly understood URL pattern of /<questionnaireId>/<sectionId>/<pageId> we would be able to use the following URLs to access either section

<baseUrl>/1/1 - Questionnaire 1 Section 1

<baseUrl>/2/1 - Questionnaire 2 Section 1

This would be nicer than our current auto-incremented urls where things like this are not uncommon.

<baseUrl/250/1000/2500 - Here it's non-obvious why page and section have such high Id values. The next page is not 2501 as one might expect, but rather could be a completely random number.

The complexity with this approach is that the next Id needs to be calculated at insert time. So if we consider the previous example of adding a new section:

id scoped_id title questionnaireId
1 1 Section A 1
2 1 Section A 2
3 2 Section A 1

We need to know that Section 3 should have a scoped id of 2. This becomes more tricky when you also consider moving pages between sections, and moving sections between questionnaires (which is an upcoming requirement).

It simplifies copying the row values and in theory we no longer have to maintain references of the things that are copied. But to me, it feels like we're moving the complexity elsewhere namely in the insert/move/delete logic. Although this could be hidden behind a view table similarly to how the order values are calculated.

Another drawback of this approach is that any pages would then hang off the scoped_id so we'd lose referential integrity.

Method 3: Changing the persistence mechanism

At the moment we're using relational database which has proven to work quite well for our use case, but it starting to show its limitations.

One thing that might be worth some consideration is moving away from a relational database to using an alternative persistence layer, such as a document database or a graph database.

In many respects, what we're allowing authors to build using our tool closely resembles a document of some kind and typically we read entire documents (i.e. we get the sections and pages and associated entities for a questionnaire) so it feels like questionnaire would be a logical document boundary.

In some respects what we're building also feels like an interconnected graph of nodes and edges (relationships between the nodes). For example a questionnaire node has a number of Section nodes connected to it's sectionOf edge. An answer is connected to a page both via an answerOf edge and a pipedValue edge.

Whilst this change is likely to be quite large and far-reaching, I don't think we should automatically discount it based on it's size.

It would likely simplify

  • Duplication logic
  • Ordering logic (lists are first class citizens)
  • You get scoped Ids for free, since everything is scoped to the questionnaire document.

AWS has dynamo service that we could use and there are Graph database implementations that we could look at as well. Each has associated docker containers that we could use for testing etc.

Clone this wiki locally