-
Notifications
You must be signed in to change notification settings - Fork 21
accumulo-proxy docker container #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ker container. I have only implemented support for Accumulo 2.x and by default this first commit contains: * Accumulo 2.0.0 * Hadoop 3.2.1 * Zookeeper 3.5.7 A new document (DOCKER.md) has been created to start to document the implementation and usage guide which should allow others to test this if they so wished. A number of outstanding questions will be posted on the issue, there is also a number of TODOs still required to be implemented that I'm tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volmasoft responses to your questions below
Given that accumulo has it's docker code in accumulo-docker do we need to do the same with this repo e.g. accumulo-proxy-docker or are we happy to merge it into the main code base?
I like putting it here for now, and I like what you did in this PR.
My current thought is we should allow users to volume mount in (or rebuild the container) with a new proxy.properties and we could document both approaches
I think allowing to mount in a props files is good for now, fairly easy, and your documentation for it is good. I would be in favor of follow on work to allow passing options on the command line like docker run --name accumulo-proxy accumulo-proxy:latest -o <proxy prop1>=<val1> -o <proxy prop2>=<val2>. The accumulo script supports this behavior.
Currently this change doesn't support selecting zookeeper/accumulo/hadoop versions, they're hard coded.
I think the version ARGS in the dockerfile would allow this. So if a user wants to use certain version, then they need to run docker build. This could be documented.
e.g. instead of mvn package -Ptarball we could provide a mvn package -Pdocker or similiar.
I am not sure what mvn package -Pdocker would do, so its hard for me to comment on it. That could be a follow on issue, or if you have something specific in mind it could be pushed as part of this PR for discussion.
Looking at the accumulo-docker repo it was decided to put documentation into the container (README.md on this line)
My memory is a bit fuzzy, but I think the readme copy in accumulo-docker was a bit of a hack. I think all of the other files on that copy command were optional. However, if the docker copy command has zero things to copy it would fail. So the readme was added just to ensure the copy command always had at least one file.
Anyone know if I'm missing something in my configuration?
I have not had a chance to look into the specific problem you are seeing, but I do know that the latest version of ZK has moved their jars around inside their tarball and this causes problems for the default Accumulo 2.0 config.. The recent PR apache/accumulo-docker#12 added sed command to update the Accumulo 2.0 conf for ZK.
This may make the installed versions a bit less transparent if you were to jump into the container (e.g. it's not /opt/hadoop-3.2.1/). Does anyone have any strong feelings about this?
I think this was done in accumulo-docker to simplify/shorten the dockerfile. That is a good point about someone jumping into the file and looking around. They could still find the versions by digging a bit deeper and finding Hadoop or ZK jars. However I would not be opposed to making the dir names have the version.
…lso tidies up a number of other aspects. * These are now configured in the same mechanism as accumulo-docker see: apache/accumulo-docker@ff8cedf * The CLASSPATH env override has been removed. * The DOCKER.md and README.md file has been removed from the container. * Updated the DOCKER.md to try and be more explicit about networking choices depending upon your environment. * I've updated the install process for hadoop, zookeeper and accumulo to retain their version numbers in their install paths. * Symlinks have been added to avoid the need to update references should dependency versions change.
…nd contact us pages.
…the description and a renaming of ZooKeeper to it's offical uppercase Z uppercase K format.
mjwall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff here, thanks for the work
* Updated all references to "docker" to "Docker" where possible, similiar to ZooKeeper. * Updated the process to verify network connectivity. * Made a reference to the DOCKER.md from the main README.md so that people know it exists.
|
Thanks for all the feedback 👍 |
|
Thanks for the contribution @volmasoft. If you would like to be added as a contributor on the people page, please submit a PR to add yourself. |
Overview
Note this is a work in progress, still quite a bit of work to go but I thought I'd expose some of the work early.
I've tried to take the best of a few worlds and tried to keep consistency with both the accumulo-docker repo (https://github.com/apache/accumulo-docker) and the Apache guides here: (https://github.com/docker-library/official-images)
This also assumes that your instance is called "myinstance" and your zookeeper is available at "localhost:2181" from the container, this is NOT the case on some configurations such as Docker for Mac, but I'e left these as defaults for now until we figure out "(2) Configuration" below.
Note, I have left some commands commented out on the Dockerfile currently that make life easier to debug, these will be stripped before they get merged finally.
Currently I have a large number of TODOs and questions for people's input, each is numbered so feel free to answer some or all of them with your thoughts :)
If no one has any strong feelings about some of the questions below then I'll just press on.
(1) Repository
Given that accumulo has it's docker code in accumulo-docker do we need to do the same with this repo e.g. accumulo-proxy-docker or are we happy to merge it into the main code base?
(2) Configuration
I've been debating at length how to allow the default configuration to be overiden.
My current thought is we should allow users to volume mount in (or rebuild the container) with a new proxy.properties and we could document both approaches, this would allow full configuration to be done and versioned outside of our accumulo-proxy project.
However we probably ought to make it easy to 'get started' and therefore we could also provide a further approach to overriding:
A) Allow people to provide an environment variable to override properties e.g. PROXY_CONFIG="key:pair" and having the Java code parse this after the default proxy.properties and update the properties - This would allow any property to be updated.
B) Allow people to only override 2 properties via environment variables (instance.name, instance.zookeepers) since these are more than likely to meet 99% of use cases, we can always add additional ones in the future.
My gut feeling is a mixture of the volume mounting/rebuild guide, plus option (B) is probably the right balance for now and we can always change these later on.
(3) Software versions
Currently this change doesn't support selecting zookeeper/accumulo/hadoop versions, they're hard coded.
Do we want to allow this support?
Does this need doing on the initial commit or would it be acceptable to follow it up.
(4) Automated Build
We could use one of the maven plugins to automatically create the docker container with a new profile, e.g. instead of
mvn package -Ptarballwe could provide amvn package -Pdockeror similiar.This would be different to the standard accumulo-docker repo which details how to build using the
docker buildcommand.I'm happy either way.
(5) Documentation in container
Looking at the accumulo-docker repo it was decided to put documentation into the container (README.md on this line)
I'm not sure it provides value but I've added both the README.md and DOCKER.md into this container currently.
Should we strip this out?
(6) CLASSPATH setting
When debugging this application I found that for some reason
accumulo classpathprovides a classpath which includes zookeeper, however it only includes (in my case)/opt/apache-zookeeper/and we actually need/opt/apache-zookeeper/lib.Look at my diff
Dockerfileand look for the lineENV CLASSPATH=/opt/apache-zookeeper/lib/*Anyone know if I'm missing something in my configuration?
(7) Install path shortcut
If you look at
Dockerfileand look for the part where I add the dependencies to /opt/ (roughly line 61 starting with "# Install the dependencies into /opt/"I take the versions of software and strip their first folder so that they are in folders similiar to /opt/hadoop/ in one step.
This may make the installed versions a bit less transparent if you were to jump into the container (e.g. it's not /opt/hadoop-3.2.1/)
Does anyone have any strong feelings about this?
--- COMMIT LOG ---
Initial commit of the work to stand up an accumulo-proxy inside a doker container.
I have only implemented support for Accumulo 2.x and by default this first commit contains:
A new document (DOCKER.md) has been created to start to document the implementation and usage guide which should allow others to test this if they so wished.
A number of outstanding questions will be posted on the issue, there is also a number of TODOs still required to be implemented that I'm tracking.