Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image #53

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Docker image #53

wants to merge 6 commits into from

Conversation

wuputah
Copy link
Collaborator

@wuputah wuputah commented Jun 21, 2024

Starting point for a Postgres image with pg_duckdb installed.

@owenthereal
Copy link

owenthereal commented Jun 21, 2024

For context reference, the reason that we can't build hydra using pgxman as part of its CI is due to the intricate limitation of pgxman's publication mechanism. It necessitates the existence and publication of the buildkit file in a central repo pgxman/buildkit, which in turn refers to a tag for source download. This creates a circular dependency, where the tagging of an extension requires pgxman to build the buildkit artifacts, but the build artifacts require the tag to exist first. This can be worked around now by:

  1. Create a buildkit in this repo with source pointing to a local path (this is already working but undocumented). This would generate the Debian files to copy into a Docker image. Even better, add a command pgxman build --export DOCKER_IMAGE that does both.
  2. For publication, we duplicate the same buildkit and save it to pgxman/buildkit with source pointing to the source of a tag. Having two canonical copies of buildkit files is not ideal, but it's not the end of the world.

When we can self-publish extensions from individual repo in the future, the circular dependency will be broken, and this workaround will no longer be necessary.

@wuputah
Copy link
Collaborator Author

wuputah commented Jun 21, 2024

heya @owenthereal, sorry for the lack of context, I added you here in part because of pgxman but also because I'm a bit of a noob when it comes to Docker so I wanted you to check my work, if you had any suggestions for improvement. For instance I'm current creating a checker image to run the tests but I guess Docker determines that this isn't necessary to make the final image, so it skips this step. Of course I could just run the tests in builder instead.

I do think it would be possible to build with pgxman based on the docs here, as you suggest a buildkit yaml would be needed as well.
https://docs.pgxman.com/building_an_extension#test-the-extension

IMO this would just be for making "dev builds" as desired, though for local testing the duckdb build takes a long time and the ccache setup seems to not work super well, so it's not a great local developer tool.

@owenthereal
Copy link

For instance I'm current creating a checker image to run the tests but I guess Docker determines that this isn't necessary to make the final image, so it skips this step.

You could run a specific target with https://docs.docker.com/reference/cli/docker/image/build/#target, e.g. docker build ... -t checker

though for local testing the duckdb build takes a long time and the ccache setup seems to not work super well

Not saying we should replace this Docker build with pgxman build right now, but being able to specify cache dir would be a nice pgxman feature to add in the future.

@mike-luabase
Copy link

this is what ultimately worked for me:

FROM postgres:16-bookworm as base

###
### BUILDER
###
FROM base as builder

RUN --mount=type=cache,target=/var/cache/apt \
  apt-get update -qq && \
  apt-get install -y build-essential libreadline-dev zlib1g-dev flex bison libxml2-dev libxslt-dev \
  libssl-dev libxml2-utils xsltproc pkg-config libc++-dev libc++abi-dev libglib2.0-dev libtinfo5 cmake \
  libstdc++-12-dev postgresql-server-dev-16 liblz4-dev ccache git

WORKDIR /build

ENV PATH=/usr/lib/ccache:$PATH
ENV CCACHE_DIR=/ccache

# Clone the pg_duckdb repository and initialize submodules
RUN git clone --branch main https://github.com/duckdb/pg_duckdb.git . && \
    git submodule update --init --recursive

# permissions so we can run as `postgres` (uid=999,gid=999)
RUN chown -R postgres:postgres .
RUN chown -R postgres:postgres /usr/lib/postgresql /usr/share/postgresql
RUN mkdir /out && chown postgres:postgres /out
RUN rm -f .depend

USER postgres

# Build and install
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 make install
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 DESTDIR=/out make install

###
### CHECKER
###
FROM builder as checker

USER postgres
RUN --mount=type=cache,target=/ccache/,uid=999,gid=999 make installcheck

###
### OUTPUT
###
# This creates a usable postgres image but without the packages needed to build
FROM base as output
COPY --from=builder /out /

@jorinvo
Copy link

jorinvo commented Sep 19, 2024

I would love to install pg_duckdb in our Postgres Docker image. I tried the Dockerfile from above, but it was stuck at 100% at make install for 30 minutes.

@mike-luabase
Copy link

I would love to install pg_duckdb in our Postgres Docker image. I tried the Dockerfile from above, but it was stuck at 100% at make install for 30 minutes.

Took a very long time to run for me too. Might have been more than 30 minutes before it was complete.

@jorinvo
Copy link

jorinvo commented Sep 20, 2024

Thanks @mike-luabase, that's good to know.
I am afraid that's not usable for us for now. But I am looking forward to having a prebuilt image available some day. pg_duckdb is definitely an exciting project 🤩

@JelteF
Copy link
Collaborator

JelteF commented Sep 20, 2024

To make the build faster it would help a lot if you changed the make install commands to be parallel, by using e.g. make -j10 install. Or maybe make -j$(nproc) install

@wuputah wuputah force-pushed the jd/docker-image branch 3 times, most recently from c38f089 to 4419031 Compare September 27, 2024 18:37
@wuputah wuputah changed the base branch from main to jd/makefile September 27, 2024 18:39
@wuputah
Copy link
Collaborator Author

wuputah commented Sep 27, 2024

To make the build faster it would help a lot if you changed the make install commands to be parallel, by using e.g. make -j10 install. Or maybe make -j$(nproc) install

Even after #211, there remains an issue with -j being passed to the duckdb build that I haven't managed to solve (though I tried). You'll see this printed in the log when running make duckdb:

make[2]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.

This is despite the fact we are using $(MAKE) as suggested for this issue. This probably has something to do with DuckDB's use of cmake, but I don't know anything about cmake.

@wuputah wuputah force-pushed the jd/docker-image branch 10 times, most recently from 7d442d8 to 0801d3f Compare September 27, 2024 20:27
@wuputah wuputah force-pushed the jd/docker-image branch 3 times, most recently from 43afc2c to 99b8660 Compare September 27, 2024 22:07
@JelteF JelteF added this to the 0.1.0 stability testing milestone Sep 30, 2024
@JelteF JelteF added the developer experience Improves our own lives label Sep 30, 2024
Base automatically changed from jd/makefile to main September 30, 2024 16:06
.dockerignore Outdated Show resolved Hide resolved
@wuputah wuputah force-pushed the jd/docker-image branch 3 times, most recently from 9e9eb02 to 2523a1c Compare October 2, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Improves our own lives
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants