java - Launch Solr Indexing in Nutch Source Code -
i trying index nutch crawl solr, inside of source code, not command line.
i have created following function
public static int runinjectsolr(string[] args, properties prop) throws exception{ string solrurl = "http://ec2-x-x-x-x.compute-1.amazonaws.com/solr/collection1"; string crawldb = jobbase.getparam(args,"crawldb", null, true); string segments = jobbase.getparam(args,"segments", null, true); string args2[] = {crawldb, segments}; configuration conf = new configuration(); conf.set("-d solr.server.url",solrurl); int code = toolrunner.run(nutchconfiguration.create(), new indexingjob(conf), args2); return code; }
but receiving following error:
2013-08-07 19:37:13,338 error org.apache.nutch.indexwriter.solr.solrindexwriter (main): missing solr url. should set via -d solr.server.url solrindexwriter solr.server.url : url of solr instance (mandatory) solr.commit.size : buffer size when sending solr (default 1000) solr.mapping.file : name of mapping file fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username authentication solr.auth.password : password authentication
so assuming not creating configuration correctly. suggestions?
or should passing config field run different way? maybe not using
nutchconfiguration.create()
there 2 problems in code:
- the
solr.server.url
must directly set in configuration object not -d option. given message nutch assumes running command line , misleading here. - as mentioned, passing 2 different configuration instances.
nutchconfiguration.create()
creates hadoop configuration internally , adds nutch specific resources don't need create yourself. also, toolrunner passes conf object indexingjob don't need pass constructor.
so correct code is:
configuration conf = nutchconfiguration.create(); conf.set("solr.server.url", solrurl); toolrunner.run(conf, new indexingjob(), args2);
Comments
Post a Comment