Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate SitemapTool.jar from the SItemapTester #44

Open
GoogleCodeExporter opened this issue Mar 23, 2015 · 7 comments
Open

Generate SitemapTool.jar from the SItemapTester #44

GoogleCodeExporter opened this issue Mar 23, 2015 · 7 comments

Comments

@GoogleCodeExporter
Copy link

Currently, in our sitemaps package we have the following file: 
SiteMapTester.java

The file name is tricky as we shouldn't have a Test file in our regular src 
tree (as Lewis has previously mentioned).

After examination of the file, I think that I understand the need for that 
file, its use is to take an online sitemap and parse it recursively, while 
printing all of the sitemap urls as a list (done recursively, so if this is an 
index sitemap it will parse all of the other sitemaps etc and print out all of 
the url entries to the console). 

I actually like this sitemap parsing util, because it gives me an answer that 
our library doesn't support natively.

My most common scenario of parsing sitemaps is parsing sitemaps recursively 
while giving me the list of URLs - this was my original requirement when I 
stumbled upon this library, I have a php script site, and I wanted to have a 
list of all of my URLs...

We should have this functionality (of parsing recursively over a sitemap while 
retrieving a list of urls) as a seperate jar tool.



I'd suggest SiteMapTool.

It would be cleanest if this was a separate artifact from the build - e.g. we 
create a crawler-commons jar, and a crawler-commons-tools.jar, where the latter 
is an uber jar (includes all dependencies) so you can just run it from the 
command line.


We should also rename the original Java file accordingly

Original issue reported on code.google.com by [email protected] on 4 Jul 2014 at 9:36

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 8 Jul 2014 at 4:31

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

@GoogleCodeExporter
Copy link
Author

I added this class as a simple way of checking what we were getting for a given 
sitemap URL. OK for renaming it to SiteMapTool but I don't think we need to 
provide the recursivity parsing as this can easily be built on top of CC + I'd 
rather avoid multiplying the jars we produce, especially for such a small 
functionality.
What about building this functionality outside CC and host it as a separate 
project e.g. on GitHub?

Original comment by digitalpebble on 9 Jul 2014 at 8:07

@GoogleCodeExporter
Copy link
Author

I still think we need this functionality inhouse.

I am playing with sitemap parsing and I return to this tool each time and use 
it.


There is a place for a git hub project using netty or whatever Ken suggested 
for heavy duty sitemap parsing! - using our library as 3rd party parsing for 
sitemap and using something else for the heavy duty networking.


But I still think we need it inhouse.

Maybe we should put it in the "test" folder ?

What is so bad about another jar ?

Original comment by [email protected] on 11 Jul 2014 at 2:48

@GoogleCodeExporter
Copy link
Author

What is so bad about another jar ?

Having one JAR separate file for such a small functionality does not make sense 
+ we want to avoid multiplying them as I explained above.

Original comment by digitalpebble on 14 Jul 2014 at 1:10

@GoogleCodeExporter
Copy link
Author

ok.

This is the conclusion of what we will do in this issue:

Just rename the file to the "Tool" suffix instead of "Test".


Please note that this issue will be taken care of after the submission of 
issue39

Original comment by [email protected] on 18 Jul 2014 at 7:08

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Please note that this issue will be taken care of after the submission of  
issue43

And not issue39 as I wrote in the last comment 

Original comment by [email protected] on 18 Jul 2014 at 7:20

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 18 Jul 2014 at 8:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant