Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL - Change default character set for all tables to utf8mb4 #846

Open
adkinsrs opened this issue Aug 1, 2024 · 10 comments
Open

MySQL - Change default character set for all tables to utf8mb4 #846

adkinsrs opened this issue Aug 1, 2024 · 10 comments
Assignees
Labels
bug Something isn't working data issues Formatting, uploading, etc of datasets

Comments

@adkinsrs
Copy link
Member

adkinsrs commented Aug 1, 2024

Currently in our MySQL 8 database tables, many if not all of our tables are using the "latin1" character set. We are starting to see some datasets, such as the dataset at https://nemoanalytics.org/dataset_curator.html?dataset_id=63d168b4-30b5-b230-52d7-94034c417c5c where some categories (in this case, cytokine family) have Greek symbols. These symbols cannot be saved into the "plotly_config" field in the dataset_displays table due to the "latin1" encoding, essentially preventing saving these displays.

I looked at https://stackoverflow.com/questions/56365197/insert-greek-character-to-a-mysql-table-column-without-using-php and it suggests that the character set needs to be updated to at least utf8 to resolve this. However, a lot of these tables are linked together, so I think all of them will need to be updated appropriately.

@adkinsrs adkinsrs added bug Something isn't working data issues Formatting, uploading, etc of datasets labels Aug 1, 2024
@carlocolantuoni
Copy link

hi shaun - is this the itcket that is related to why i cant make a "cytokine_family' violin plot with this dataset (greek characters in the meta data values)?:
https://nemoanalytics.org/dataset_curator.html?dataset_id=63d168b4-30b5-b230-52d7-94034c417c5c

@adkinsrs
Copy link
Member Author

adkinsrs commented Aug 8, 2024

Yes

@carlocolantuoni
Copy link

would a simple but hack fix to this simply be to translate the greek characters into normal/latin before asking plotly to save? id b ok with that if the real fix is a pain in the ass. eg, just change the beta character to "b"

@adkinsrs
Copy link
Member Author

adkinsrs commented Aug 12, 2024

I think the issue with translating from greek to latin to save the configuration would be that we'd have to determine if we should translate back from latin to greek when reading from the plot curation. In many cases, we have entries that would use a "b" instead of a "beta", so there would need to be extra code and steps in a lot of scripts just to see which one to use (and this is true for any of the proposed translations). Personally, the better fix would be to make the database table permissive for Greek characters.

@carlocolantuoni
Copy link

carlocolantuoni commented Aug 12, 2024 via email

@adkinsrs
Copy link
Member Author

@jorvis We could clone the nemo-prod server to a devel area, backup the database to a new database name, apply the change, and test the dataset curator link in the first comment (as well as our standard round of testing).

@carlocolantuoni
Copy link

carlocolantuoni commented Aug 15, 2024 via email

@adkinsrs
Copy link
Member Author

I meant the actual database where the encoding changes would need to take place, not the dataset

@jorvis
Copy link
Member

jorvis commented Aug 15, 2024

I'm up for this, and this weekend I can create the new devel instances.

@carlocolantuoni
Copy link

carlocolantuoni commented Aug 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data issues Formatting, uploading, etc of datasets
Projects
None yet
Development

No branches or pull requests

3 participants