Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing labels for updated response options in fetch_survey() #278

Open
context-dependent opened this issue Aug 25, 2022 · 9 comments
Open

Comments

@context-dependent
Copy link

Problem

Recently found several instances of this behaviour. I'm not sure exactly what's going on, but I have some guesses. Below, I compare example_responses_Rics <- fetch_survey() to example_responses_csv <- read_csv() to illustrate.

In the survey, the question labeled satis_overall began with a set of response options that were very slightly altered. The Rics export fails to parse these response options, though the data are present in the untreated csv export.

Starting with the following:

library(tidyverse)
library(qualtRics)

example_responses_Rics <- fetch_survey("SURVEY_ID")
example_responses_Rics |> count(satis_overall)

Counting the column produces

# A tibble: 4 x 2
  satis_overall                                           n
  <ord>                                             <int>
1 Neither satisfied nor dissatisfied          8
2 Very satisfied                                    119
3 NA                                                      11
4 NA                                                      43

While doing the same for the csv export

example_responses_csv <- read_csv("path/to/csv_download.csv")
example_responses_csv |> count(satis_overall)

Shows the updated response uptions, but as unordered character values.

# A tibble: 6 x 2
  satis_overall                                      n
  <chr>                                        <int>
1 Neither satisfied nor dissatisfied     8
2 Somewhat dissatisfied                     3
3 Somewhat satisfied                        36
4 Very dissatisfied                               4
5 Very satisfied                                119
6 NA                                                  11

In the Rics export, the column is a factor, but its levels are
outdated (they have a space that was deleted).

levels(example_responses_Rics$satis_overall)
# [1] "Very dissatisfied "
# [2] "Somewhat dissatisfied "
# [3] "Neither satisfied nor dissatisfied"
# [4] "Somewhat satisfied "
# [5] "Very satisfied"
# [6] NA

Thoughts on solution

fetch_survey() uses Qx's metadata to parse factors. The response options metadata appears not to update with changes to the survey, so I don't see a clear path to solving this problem while maintaining both the factor parsing functionality and the fast metadata * csv approach.

It feels like it probably shouldn't fail silently, as in many use cases the qualtRics user and the survey maintainer are not the same person.

The .sav export is bigger than the csv, but it bakes in updated labels that can be converted into factor levels. To me, this adds support for parameterizing the export format in fetch_survey(), though I'm not sure where the zeitgeist is at on that concept.

@jmobrien
Copy link
Collaborator

I'm assuming you were using convert = TRUE (as is the default?). And when you say this:

though the data are present in the untreated csv export.

Are you referring to the manual download from the webpage, or something else?

Also, did the changes to the question satis_overall occur before or after the survey was initially published (whether or not it had responses yet)?

I'm thinking this could relate several things, but one main underlying problem is that convert = TRUE is still looking at the older survey description endpoint, which may be the reason why you're experiencing issues.

@juliasilge, I think this is pointing again to the need for the work around #267.

@context-dependent
Copy link
Author

@jmobrien this all adds up to me.

The changes to satis_overall happened after publication, and the factor levels seem to point in the direction of the description endpoint.

I was referring to the csv exported from the GUI. I just tried fetch_survey with convert = FALSE and the result is the same as reading in the csv.

@jmobrien
Copy link
Collaborator

jmobrien commented Aug 26, 2022

@juliasilge what would you think about making convert = FALSE the default, at least until we can get things off the v2 endpoint? I know conversion has been the default behavior for a long time, but I'm concerned we might be steering users, maybe esp. new ones, towards potentially problematic behavior.

As @context-dependent noted, that change would just mean that users by default get the same data they get from a web download with default settings (still with some bonuses like cleaning up the metadata row & a column map)

@context-dependent
Copy link
Author

@jmobrien changing the default behaviour to convert = FALSE is an elegant and straightforward solution, it could break extant data pipelines, but I'm not sure if that's a dealbreaker.

The short-term options, imo are:

Set default to convert = FALSE

As you suggest, this would give new users data that meets their expectations by default, but may break code that currently relies on the present default behaviour.

Detect and handle conversion parsing failures

This would be a bigger lift, but I think still manageable in the short term. The nugget is getting wrapper_mc() to detect parsing failures. Once that is done, it could then raise a warning, skip conversion, change its behaviour, or some combination of the above.

Interested in thoughts, happy to help however I can.

@juliasilge
Copy link
Collaborator

I'm hesitant to change the default behavior for something that has worked the same way for such a long time (predating my own involvement in this package); that is a type of change that you can't really inform users about, apart from a message every single time they use the function (folks hate that). I think it likely would be better to change over to the new endpoint instead, for example supporting the includeLabelColumns option. We can open an issue to track input on changing to convert = FALSE but I think it would be better to update the underlying API call.

@jmobrien
Copy link
Collaborator

jmobrien commented Sep 7, 2022

@context-dependent Thanks for your offer of help. It feels like the real fixes here are intertwined with a lot of other factors, so I'd be reluctant to have you commit much work to something that's likely to get changed.

But, if you're up for it, adding some kind of check + warning for parsing failures might be a great stopgap measure. Basically, where if wrapper_mc() detects elements in some variable not in the list of expected levels obtained from the survey_questions() call, you throw a warning that reports the problem variable(s) and any unmatched elements.

(@juliasilge thoughts? Along w/warning, should we also tack on any unmatched elements to the vector of factor levels? Simple warnings an easy way to preserve existing behavior for now--but, basic users would have a hard time fixing anything after a warning (b/c all non-matched values are now NAs). Appending the levels means that users would need to rearrange, but that's at least possible w/o going back to the web or redownloading.

@juliasilge
Copy link
Collaborator

That sounds like a reasonable stopgap measure to me, and I would probably lean toward doing a simple warning to start with (with a recommendation to redownload with convert = FALSE?). We could extend that to some fixing/handling next.

@jmobrien
Copy link
Collaborator

jmobrien commented Sep 8, 2022

That's probably a good balance, yes.

@context-dependent is this something you'd like to tackle? If so, great, and let us know if you have any questions. (If useful, you can see several recent use examples for the rlang::warn() warning tool in /R/Checks.R.)

@context-dependent
Copy link
Author

Thanks! I would be happy to give it a shot next week if that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants