-
Notifications
You must be signed in to change notification settings - Fork 12
Added HdfsUrlStreamHandlerFactory, set in LocalCachingContextClassLoaderFactory #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…derFactory Added HdfsUrlStreamHandlerFactory and set it as the URL StreamHandlerFactory in a static initializer in LocalCachingContextClassLoaderFactory to handle the hdfs protocol. Closes apache#47
...der/src/main/java/org/apache/accumulo/classloader/lcc/util/HdfsURLStreamHandlerProvider.java
Show resolved
Hide resolved
|
Tested locally with an Accumulo instance and the process was able to download a jar file from HDFS. |
|
If we used URI instead of URL in the Resource class would that fix things w/o having to register this handler statically? Assuming we are registering a handler that supports creating an input stream, but we are not using that functionality? Is that correct? If that is correct, then we are creating and registering code that is not being used just so the string will work. That is what made we wonder if using URI instead would work. |
|
Ran experiment to see if URI was ok w/ hdfs. |
|
created #49 |
ctubbsii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try to do the Provider registration I suggested below instead of setting the factory, and if it works, I'll push an update to this PR.
...ader/src/main/java/org/apache/accumulo/classloader/lcc/util/HdfsUrlStreamHandlerFactory.java
Outdated
Show resolved
Hide resolved
| // method. We are not using org.apache.hadoop.fs.FsUrlStreamHandlerFactory | ||
| // because it overrides the handling of the "file" protocol. | ||
| static { | ||
| URL.setURLStreamHandlerFactory(new HdfsUrlStreamHandlerFactory(new Configuration())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, in Java 9+, there's a better way to do this, that avoids the problem of needing to call this only once in the JVM. Instead of setting the factory, you can register a URLStreamHandlerProvider to handle the hdfs URLs. We can use @AutoService to generate the appropriate META-INF/services file.
|
In #49 I removed all calls to |
I commented on that. We should not do that. I explained why on that PR. |
I think #49 is cleaner because it avoids registering JVM wide resources (which could cause code outside the classloader plugin to change behavior). I really like the locality. In the current code FileResolver is where the download happens, if its happy w/ the URL/URI then things are fine. Looking at the java code that converts URIs to URLs the main thing it checks is if the URI has a scheme which #49 adds a check for in FileResolver. |
* Register a provider for the hdfs: URL stream handling, rather than override the system factory * Also fix SpotBugs and rename classes to match the naming convention from the classes they override (e.g. "URLStreamHandler" instead of "UrlStreamHandler")
It introduces the possibility of far more problems than it solves. Take a look at the changes I just pushed. This fixes it much more narrowly. All it does is say "Use Hadoop to open hdfs: URLs". Any other application's implementation would ultimately do the same thing, but if somebody really wanted something else to handle hdfs: URLs in their code, they can easily do so by manipulating the Java ServiceLoader files on their classpath (META/INF/services). In my change, we don't forcibly register it in code at all anymore. We just put a provider on the classpath, that the user can choose to omit/exclude without changing code (may require a rebuild of the jar, or using a plugin to transform the jar contents to remove the service file, but that's not very difficult).
My concern isn't with |
ea54c13 to
732749a
Compare
|
Testing locally works with the latest commit |
|
Moving the change from calling the static method to a service loader is an improvement. This is still making a global change to the VM which can make for fun debugging (have debugged code X that was broken by some odd global change that unrelated code Y did multiple times). I still think the URI change is more clean because more locality makes things more predicable and easier to reason about. These global VM changes are not made because the intent is to make this global change for all code in Accumulo, these global changes are being made just to get this specific code to work. I am not opposed to these change, just sharing my opinion. It seems both changes have some problems. |
I think this could be resolved by not using the ServiceLoader and instead constructing URL objects using the constructor URL(String context, String spec, URLStreamHandler hander). Looking at the constructor that just takes a String for the URL, we would pass |
I like this suggestion, but it means we'd have to register a custom type adapter for GSON, which is frustrating. |
* Remove resolvers (keep test coverage for file, http, and hdfs URL types) * Use url.openStream() * Be agnostic to different URL types * Depend on an HdfsURLStreamHandlerProvider for hdfs URL support * Place hdfs provider in its own module/jar * Clean up related POM and README stuff This fixes apache#47 This supersedes apache#48 and apache#49 alternate fixes
Added HdfsUrlStreamHandlerFactory and set it as the URL StreamHandlerFactory in a static initializer in LocalCachingContextClassLoaderFactory to handle the hdfs protocol.
Closes #47