Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading job list is extremely slow #115

Open
nclaidiere opened this issue Apr 15, 2022 · 14 comments
Open

Uploading job list is extremely slow #115

nclaidiere opened this issue Apr 15, 2022 · 14 comments
Labels
enhancement New feature or request

Comments

@nclaidiere
Copy link
Collaborator

Uploading job list is extremely slow, about 3 mins for a 5kb csv file (I stopped counting when I had to upload a 10k trial list, I did it overnight but I think it was more than 1h).

@dschreij
Copy link
Member

I haven't experienced this actually! For me this is quite fast, even for larger files.
@smathot what are your experiences with this?

@nclaidiere
Copy link
Collaborator Author

Ok, so one question is does that depend on the number of participants? is the uploading creating jobs lists and session data, etc before assigning participants?

@smathot
Copy link
Collaborator

smathot commented May 13, 2022

I have also never experienced this, but of course I've never worked with a database that's as populated as the one of @nclaidiere.

@smathot
Copy link
Collaborator

smathot commented Jun 23, 2022

I am testing this now on the server of @nclaidiere and I can confirm that this is really slow, about 30 seconds for a table with 50 rows and 2 columns. (Whereas otherwise the server is fairly snappy.) I suspect that Study.processJobs() is the culprit.

@dschreij
Copy link
Member

dschreij commented Sep 18, 2022

Did some research into this, after I also experienced bad performance during an automated test run of a study. During that run, each query took about two seconds (according to the OpenSesame console), while for @smathot executing was in millisecond domain (#92 (comment))
My assumption is that this process is so slow because of queries taking a lot longer than usual, and there are a lot of queries in this function.
I found quite some reports of mysql vastly underperforming when running in Docker (on Windows), for instance:

There are also some proposed solutions for which varying degrees of success are reported:

  • Add :delegated to the mysql data path mapping in docker-compose.
  • Map the docker data folder to a physical location on the disk. It currently points to a volume container, which is the preferred method, but other than that, there wouldn't me much difference for persistence between the two options.

I'll see if I have time to play around with these soon and see if they have any effect. If none of these work, we can also try to:

  • Replace MySQL with PostgreSQL, which is another popular and widely-used open-source database. I mainly work with it for current projects, and it works really well in Docker and on Windows. Since this project uses an ORM that supports Postgres out of the box. It shouldn't be that much work to replace (but may bring some surprises too).
  • Install and run the windows version of mysql locally, making Docker unnecessary. This involves some more steps when setting up and makes the system less portable, but may remedy this and other performance and memory issues.

@smathot
Copy link
Collaborator

smathot commented Sep 19, 2022

My assumption is that this process is so slow because of queries taking a lot longer than usual, and there are a lot of queries in this function.

That does sound likely. That being said, uploading is also slow on servers for which the other queries go very fast.

@kristian-lange
Copy link
Collaborator

So, the culprit lays in Study.js in function processJobs.

If there are many job entries in the uploaded job file the following loop does an unholy amount of database requests that slows down the whole upload procedure:

    for (const [i, row] of Object.entries(jsonData)) {
      const job = await this.jobs().create({
        position: parseInt(i) + 1
      }, trx)

      await job.variables().attach(Object.keys(varTable), (record) => {
        record.value = row[varTable[record.variable_id]]
      }, trx)
    }

It has been difficult for me to improve speed in this loop due to:

  1. my inexperience with the used technology (Adonis/Lucid)
  2. Adonis/Lucid being used in an outdated version. The version being used according to package.json is 6.3 but the way it is is used in OOM is 4.x (https://legacy.adonisjs.com/docs/4.0/lucid) . As far as I can see 4.x does not allow real ORM which makes it more difficult to use.
  3. an complex database scheme around studies, jobs, variables, and job_variable.

So far I reached an improvement of around half the time by simply wrapping the for loop with const trx = await Database.beginTransaction(); and await trx.commit(), which should prevent it from sending the database requests one-by-one but all at one in the end.

Maybe this is already enough. If not I have to revisit this issue and probably have to try to refactor the database scheme.

@smathot
Copy link
Collaborator

smathot commented Jul 9, 2024

Thanks for this!

Maybe this is already enough. If not I have to revisit this issue and probably have to try to refactor the database scheme.

Why don't you try uploading a list of say 1000 rows? If this works within a reasonable time frame, then that's ok.

Some background: Once we've refactored the database as described under #155 then it is no longer necessary to be able to upload extremely long job lists of tens of thousands of rows. The reason for such long lists would be to implement repeating trials through this route, rather than resetting job states, to avoid triggering database issues.

This means that our goal is now to make uploading of regular-sized lists fast enough so that it doesn't significantly impair usability. How long does 1000 rows take for you?

@kristian-lange
Copy link
Collaborator

Why don't you try uploading a list of say 1000 rows? If this works within a reasonable time frame, then that's ok. ...
How long does 1000 rows take for you?

What would you consider a reasonable time frame here? And is the content of the job file any important. I currently use a simple job file with entries like:

seq,firstname
1,Charlie
2,Adele
3,Bill
4,Josie
5,Louisa

With this data it takes the following to upload a job file:

  • 1000 jobs: ~6s
  • 10.000 jobs: ~60s

But those times aren't much different to how it was before.

Some background: Once we've refactored the database as described under #155 then it is no longer necessary to be able to upload extremely long job lists of tens of thousands of rows. The reason for such long lists would be to implement repeating trials through this route, rather than resetting job states, to avoid triggering database issues.

I'm actually on #155 now and try to think about a suitable database schema. I thought #155 is about refactoring the schema for the tables that store the resulting job data, namely table job_states that has the field data of type JSON that contains the result data. But the jobs that are uploaded are held in different tables, namely jobs, job_variable, and variables. I made a visualization of the database schema mostly for my own understanding.

Image

  • tables responsible for storing jobs: jobs, job_variable, and variables and linking to studies
  • tables responsible for storing job results: job_states and job_statusses and linking to participants

I assume that a refactoring of the job result tables would have no impact of the job tables and therefore no impact on the upload times that this ticket is about. Am I assuming right?

Maybe my confusion stems from the word 'trial', that does not exist in the database schema. Can you maybe define 'trial' for me again? Is it equal to a job result?

Also, maybe we should continue the discussion in #155.

@smathot
Copy link
Collaborator

smathot commented Jul 11, 2024

With this data it takes the following to upload a job file: 1000 jobs: ~6s 10.000 jobs: ~60s But those times aren't much different to how it was before.

Those times are reasonable, but don't correspond to what @nclaidiere experienced on their server. This may be because the performance of MySQL is very different in their environment, which is a Windows system running Docker. What kind of environment are you using to test this?

I assume that a refactoring of the job result tables would have no impact of the job tables and therefore no impact on the upload times that this ticket is about. Am I assuming right?

That is correct.

Maybe my confusion stems from the word 'trial', that does not exist in the database schema. Can you maybe define 'trial' for me again? Is it equal to a job result?

Yes. In most experiments, a trial would equal a job. And the result on a trial would equal a job result. So you can treat them as synonymous for this purpose.

@kristian-lange
Copy link
Collaborator

Those times are reasonable, but don't correspond to what @nclaidiere experienced on their server. This may be because the performance of MySQL is very different in their environment, which is a Windows system running Docker. What kind of environment are you using to test this?

I use my Linux laptop with 32GB memory and a good CPU. Maybe the MySQL container in the production setup has some constrains. With Docker one can limit the resources that Docker gives each container. Or the overall host doesn't have much resources. Do you know the numbers of the production system?

Another reason are the simple job data I used with just two columns. I did a new test, this time with 11 columns:

seq,firstname,lastname,age,street,city,state,zip,dollar,pick,date
1,Vincent,Reid,50,Ruwce Parkway,Ursonib,LA,86135,$1297.26,YELLOW,05/23/2024
2,Kenneth,Goodman,64,Vovmow Park,Bujoba,MO,74213,$842.37,BLUE,03/22/1937
3,Travis,Higgins,60,Foba Manor,Zomleaki,MT,43770,$9467.29,YELLOW,06/21/1932
4,Minerva,McCoy,56,Avenu Drive,Lodagab,MA,22653,$4011.00,GREEN,02/08/2016
5,Virginia,Howard,37,Jutebe Court,Idtiwmen,NV,44747,$9778.00,GREEN,01/22/1914
...

and it took longer:

  • 1000 jobs: 31s
  • 10.000 jobs: 104s

If Nicolas has even more complex job data it can take even longer.

Ah, now I re-read the Nicolas' original statement: "Uploading job list is extremely slow, about 3 mins for a 5kb csv file (I stopped counting when I had to upload a 10k trial list, I did it overnight but I think it was more than 1h).". My job data files' size from today's test are 76kB and 769kB. So, it is substantially faster on my system. I have three ideas for the cause:

  1. Like you stated already the production system might have some constraints. We need to know the system setup there: CPU, memory, disk (free and used).
  2. The MySQL container might have some limitations.
  3. Another thing that comes to my mind: MySQL has a binary log that is often activated by default in newer versions but that can eat up a substantial amount of resources. In most cases the binary log is never used and instead because there are other backups available and it can be safely deactivated.
  4. The database might slow down the more data are stored in it. This would hint at a problem with the database schema. A well-known problem causing this symptom is a missing index, although I couldn't see a missing index in the schema so far. But our schema is definitely on the complex side.

So first we need some help from @nclaidiere: Can you please give us the metrics for memory (free and used), disk (free and used), and CPU on the production system? Are there other applications running on the host? And can you please check if the binary log is activated on the MySQL? And last, do you impose limits on MySQL docker container?

@smathot
Copy link
Collaborator

smathot commented Jul 12, 2024

Ok, @nclaidiere we'll wait for your input on this.

@nclaidiere
Copy link
Collaborator Author

Hi, I am back from a break.

I did some tests, on my windows laptop a 1000 rows x 10 columns job list takes 1 mins to upload, on the now more powerful linux server (128GB RAM, Intel® Xeon(R) Silver 4216 CPU @ 2.10GHz × 64) it takes about 27 minutes.

Can you please give us the metrics for memory (free and used), disk (free and used), and CPU on the production system?
Memory (RAM, DD) is largely free > 80%.

Are there other applications running on the host?
Nope, it is used only for this

And can you please check if the binary log is activated on the MySQL?
Binlog is activated but i purge the logs very quickly, this has been the source of many probems. Here is the relevant section of docker-compose:
services:
mysql:
container_name: mysql
image: 'mysql:8'
command: --sort_buffer_size=1G
restart: always
ports:
- 3306:3306
volumes:
- mysql-data:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: 4D#BF_RPtrQ3Q2n=VuNR!k5WvhLsmB#8
MYSQL_DATABASE: omm
MYSQL_USER: omm
MYSQL_PASSWORD: LhWxqJ^?pn@6AK+md6wZ_hsA=Qz52DSm
command:
- --sort_buffer_size=5M
- --innodb_redo_log_capacity=1073741824
- --binlog_expire_logs_seconds=86400

And last, do you impose limits on MySQL docker container?
I don't think so, hence the usage of all the available memory when the binlogs are not purged.

I m not sure but I have a feeling this also depends on the number of participants, 6 on the laptop, 36 on the server. If that is the case, then it is a problem since the number of participants can easily increase in the order of a few 1000s.

Hope this helps!

@dschreij
Copy link
Member

dschreij commented Jul 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants