Sponsors
Sponsor Products
Diskover file system crawler
posted by Chris Park  on April 21, 2017, 9:35 p.m. (3 months, 5 days ago)
48 Responses     0 Plus One's     9 Comments  

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git link https://github.com/shirosaidev/diskover .

Email me cpark16@gmail.com with any feedback/bugs.

Thanks


Thread Tags:
  discuss-at-studiosysadmins 

Response from Chris Park @ June 24, 2017, 3:07 p.m.

I've created a Google Group for discussions/questions about diskover.

https://groups.google.com/forum/?hl=en#!forum/diskover

 


0 Plus One's     0 Comments  
   

Response from Chris Park @ June 22, 2017, 12:56 a.m.

Hey Greg,

There was many bug fixes in the newest version of diskover-web (v1.0.2) if that is what you were using to search your diskover indices.

This may also help for tuning ES:

https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

 


0 Plus One's     0 Comments  
   

Response from Chris Park @ June 22, 2017, 12:48 a.m.

diskover v1.1.1 has been released. Includes Gource visualization support, Support for Windows, Python3, and ES/Kibana 5.4. https://github.com/shirosaidev/diskover .

diskover-web panel v1.0.2 has also been released. Many bug fixes and improvements.

https://github.com/shirosaidev/diskover-web .

Enjoy!


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ June 20, 2017, 5:20 p.m.
Hey David,
Can't remember the number of files exactlybut you're probably close. Trying elastic search on1 local node. I7 with 128G RAM on a fiber mounted stornext file system. Indexing isnot really an issue. It's getting to see the datathat's the problem. Nothing seems peggedfor ES. It's just doing something. It not returning. I can list indexes and otherwise interact but searches seem to bog it down. I say it's because it's java ;-)
Thanks,Greg

--Greg Dickiejust a guy
On Jun 20, 2017, at 17:08, David Leach <dleach@wetafx.co.nz> wrote:

You've probably got between 300 million and 500 million inodes I suspect. That's a fair few documents to coalesce. 


What kind of specs does your elastic instance have? Are you seeing pegged cpu when you attempt to load it?


As for the scan performance, there's nothing magical going on under the hood and you're not likely to see the benefits of the usage of scandir over listdir on nfs.




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Wednesday, 21 June 2017 6:27 a.m.
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  Hi Chris,
  Really trying to get this running but ES confuses me or I'm doing something silly. Is anyone using this on an approx. 1.5PB filesystem? It takes about 5 hours to crawl but that seems to work fine. Trying to run the web-ui just hangs and ES sits at 500% CPU fo hours. Am I doing something silly or do I just need to go through the process of tuning ES?
Then there's kibana ;-) Back to the tutorials ....
Thanks, Greg

On Sat, Jun 17, 2017 at 2:00 AM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for the feedback! I'm excited to release the next version which will include gource support. Heres another video of diskover using gource showing a realtime crawl using 8 worker threads. Hoping to have the new release out next week! Enjoy :)

https://youtu.be/qKLJjZ0TMqA

diskover gource

 


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments  
   

Response from David Leach @ June 20, 2017, 5:10 p.m.

You've probably got between 300 million and 500 million inodes I suspect. That's a fair few documents to coalesce. 


What kind of specs does your elastic instance have? Are you seeing pegged cpu when you attempt to load it?


As for the scan performance, there's nothing magical going on under the hood and you're not likely to see the benefits of the usage of scandir over listdir on nfs.




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Wednesday, 21 June 2017 6:27 a.m.
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  Hi Chris,
  Really trying to get this running but ES confuses me or I'm doing something silly. Is anyone using this on an approx. 1.5PB filesystem? It takes about 5 hours to crawl but that seems to work fine. Trying to run the web-ui just hangs and ES sits at 500% CPU fo hours. Am I doing something silly or do I just need to go through the process of tuning ES?
Then there's kibana ;-) Back to the tutorials ....
Thanks, Greg

On Sat, Jun 17, 2017 at 2:00 AM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for the feedback! I'm excited to release the next version which will include gource support. Heres another video of diskover using gource showing a realtime crawl using 8 worker threads. Hoping to have the new release out next week! Enjoy :)

https://youtu.be/qKLJjZ0TMqA

diskover gource

 


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400

0 Plus One's     0 Comments  
   

Response from Greg Dickie @ June 20, 2017, 2:30 p.m.
Hi Chris,
Really trying to get this running but ES confuses me or I'm doing something silly. Is anyone using this on an approx. 1.5PB filesystem? It takes about 5 hours to crawl but that seems to work fine. Trying to run the web-ui just hangs and ES sits at 500% CPU fo hours. Am I doing something silly or do I just need to go through the process of tuning ES?
Then there's kibana ;-) Back to the tutorials ....
Thanks,Greg

On Sat, Jun 17, 2017 at 2:00 AM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for the feedback! I'm excited to release the next version which will include gource support. Heres another video of diskover using gource showing a realtime crawl using 8 worker threads. Hoping to have the new release out next week! Enjoy :)

https://youtu.be/qKLJjZ0TMqA

diskover gource


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Chris Park @ June 17, 2017, 2 a.m.

Thanks for the feedback! I'm excited to release the next version which will include gource support. Here’s another video of diskover using gource showing a realtime crawl using 8 worker threads. Hoping to have the new release out next week! Enjoy :)

https://youtu.be/qKLJjZ0TMqA

diskover gource

 


0 Plus One's     0 Comments  
   

Response from Rob Aitchison @ June 15, 2017, 8:55 p.m.
That is absolutely amazing, thank you for posting this preview of what might come next!Keep up the great work!!!
On Thu, Jun 15, 2017 at 8:46 PM, Chris Park <content@studiosysadmins.com> wrote:

The next version of diskover will include gource visualization support to visually see file system modifications over time by user. Here is a video showing an example https://youtu.be/InlfK8GQ-kM .


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Chris Park @ June 15, 2017, 8:46 p.m.

The next version of diskover will include gource visualization support to visually see file system modifications over time by user. Here is a video showing an example https://youtu.be/InlfK8GQ-kM .


0 Plus One's     0 Comments  
   

Response from Anonymous @ June 15, 2017, 5 a.m.

Chris,

This looks great, I hope to get a chance to deploy this at some point.

All the best,

Mamading


On 14/06/2017 22:54, Chris Park wrote:

diskover v1.1.0 Elasticsearch file system crawler has been released. Faster crawl speed and also early support for Windows and Python 3. https://shirosaidev.github.io/diskover/




To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

-- 
Mamading Ceesay
Systems Administrator

VISION3 Ltd

0 Plus One's     0 Comments  
   

Response from Chris Park @ June 14, 2017, 5:54 p.m.

diskover v1.1.0 Elasticsearch file system crawler has been released. Faster crawl speed and also early support for Windows and Python 3. https://shirosaidev.github.io/diskover/


0 Plus One's     0 Comments  
   

Response from Chris Park @ May 31, 2017, 3:58 p.m.

diskover v1.0.15 has been released. New diskover web tag manager front-end and other bug fixes/enhancements. https://shirosaidev.github.io/diskover/


0 Plus One's     0 Comments  
   

Response from Anonymous @ May 29, 2017, 1:15 p.m.
I'm excited to give this a go, the integration with elastic-search is exactly what I'm looking for.
One observation, I see in your code that you are using os.listdir for the file-scan, you might want to switch to: scandir (see https://github.com/benhoyt/scandir)it does lots of caching of the stat calls, which may give you a speed boost. Scandir is now part of python3, where part of it replaces the horribly slow os.walk.
Sam.
On Thu, May 25, 2017 at 7:35 PM, Chris Park <content@studiosysadmins.com> wrote:

diskover v1.0.13 is out! New feature includes web front-end for searching and tagging files. https://shirosaidev.github.io/diskover/


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     1 Comments  
   

Response from Chris Park @ May 25, 2017, 10:35 p.m.

diskover v1.0.13 is out! New feature includes web front-end for searching and tagging files. https://shirosaidev.github.io/diskover/


0 Plus One's     0 Comments  
   

Response from Chris Park @ May 17, 2017, 1:40 p.m.

diskover v1.0.11 has been released https://shirosaidev.github.io/diskover/

Any new feature requests please reply to this thread.

 


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 11, 2017, 3 p.m.
Ah, was not sure if you could span indices. Time to do some reading.
Thanks,Greg
On Thu, May 11, 2017 at 2:24 PM, Chris Park <content@studiosysadmins.com> wrote:

For tracking projects, I suggest creating a new index each time crawling from the top-level directory of the project.

Every week or month for example:diskover-PROJA-2017.05.10, diskover-PROJA-2017.05.17, etc.

Then create a new Kibana index patterndiskover-PROJA-*to filter to just those indices.

To visualy see the change over time, create a line chart in Kibana usingsum of filesizefory-axisand setx-axistodate-histogramusingindexing_datefield.Set interval to weekly or monthly.


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Chris Park @ May 11, 2017, 2:24 p.m.

For tracking projects, I suggest creating a new index each time crawling from the top-level directory of the project.

Every week or month for example: diskover-PROJA-2017.05.10, diskover-PROJA-2017.05.17, etc.

Then create a new Kibana index pattern diskover-PROJA-* to filter to just those indices.

To visualy see the change over time, create a line chart in Kibana using sum of filesize for y-axis and set x-axis to date-histogram using indexing_date field. Set interval to weekly or monthly.


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 10, 2017, 7 p.m.
If we wanted to track growth of a directory over time (eg: for a given project) would we have to use the same index vs a new one for each run?
Greg
On Wed, May 10, 2017 at 6:06 PM, Chris Park <content@studiosysadmins.com> wrote:

Diskover v1.0.9 has been released! Some bug fixes for dupes and new feature allows you to create relational graphs using x-pack plugin. See screenshots on github page for examples of dupes and hardlinks graphs.

https://shirosaidev.github.io/diskover/


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     1 Comments  
   

Response from Chris Park @ May 10, 2017, 6:06 p.m.

Diskover v1.0.9 has been released! Some bug fixes for dupes and new feature allows you to create relational graphs using x-pack plugin. See screenshots on github page for examples of dupes and hardlinks graphs.

https://shirosaidev.github.io/diskover/


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 8, 2017, 3:05 p.m.
Nice. I was a bit stuck with kibana so thanks for providing more info on that.
Greg
On Mon, May 8, 2017 at 2:35 PM, Chris Park <content@studiosysadmins.com> wrote:

Ive released version 1.0.7 of Diskover filesystem crawler which fixes some bugs and adds new feature to be able to run concurrent crawlers and stream data into same Elasticsearch index.

Github link https://github.com/shirosaidev/diskover

Thanks to everyone whos been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Chris Park @ May 8, 2017, 2:35 p.m.

I’ve released version 1.0.7 of Diskover filesystem crawler which fixes some bugs and adds new feature to be able to run concurrent crawlers and stream data into same Elasticsearch index.

Github link https://github.com/shirosaidev/diskover

Thanks to everyone who’s been helping with the project!


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 5, 2017, 1:55 p.m.
Wow, works much better with versions from this century!
Sorry for the noise guys, should have checked versions first.
Greg
On Fri, May 5, 2017 at 1:47 PM, Dave Young <davey@themill.com> wrote:

i hit send too soon, forgot to mention:


https://github.com/shirosaidev/diskover

underRequirements:


Elasticsearch (local or AWS ES service, tested on Elasticsearch 5.3.0) Kibana (tested on Kibana 5.3.0)



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: Dave Young
Sent: Friday, May 5, 2017 1:44 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler

relevant bits: (find output) /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info (rpm output) elasticsearch-1.7.3-1.noarch
python-elasticsearch-1.9.0-1.el7.noarch
not sure why pip would have installed such an early version of the python client, but i'd suggest cloning it from here:


https://github.com/elastic/elasticsearch-py

and installing matching elasticsearch RPMs (and kibana while you're at it) from here:


https://www.elastic.co/downloads/elasticsearch




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 1:32 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **
Hey Dave (err mate),
Would be nice if the client bitched about not being compatible with the server. Here is the output requested, don't see anything with a 5.x version:
root@backup ~]$ find /usr/lib/ -name '*elasticsearch*' | grep python /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyo /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.py /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyc /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyo /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/elasticsearch /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info [root@backup ~]$ rpm -qa | grep -i elasticsearch elasticsearch-1.7.3-1.noarch python-elasticsearch-1.9.0-1.el7.noarch
There is a 2.4.5 version of elastic search in the repos.
Greg

On Fri, May 5, 2017 at 12:45 PM, Dave Young <davey@themill.com> wrote:

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
DAVEYOUNG
SeniorSystemsEngineer
T +12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \ /::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ / \::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover
shirosaidev/diskover github.com diskover - Diskover File System Crawler




[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T +12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Dave Young @ May 5, 2017, 1:50 p.m.

relevant bits:  (find output) /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info (rpm output) elasticsearch-1.7.3-1.noarch
python-elasticsearch-1.9.0-1.el7.noarch
not sure why pip would have installed such an early version of the python client, but i'd suggest cloning it from here:


https://github.com/elastic/elasticsearch-py

and installing matching elasticsearch RPMs (and kibana while you're at it) from here:


https://www.elastic.co/downloads/elasticsearch




 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL
From:
studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 1:32 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **
Hey Dave (err mate),
  Would be nice if the client bitched about not being compatible with the server. Here is the output requested, don't see anything with a 5.x version:
root@backup ~]$ find /usr/lib/ -name '*elasticsearch*' | grep python /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyo /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.py /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyc /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyo /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/elasticsearch /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info [root@backup ~]$ rpm -qa | grep -i elasticsearch elasticsearch-1.7.3-1.noarch python-elasticsearch-1.9.0-1.el7.noarch
There is a 2.4.5 version of elastic search in the repos.
Greg

On Fri, May 5, 2017 at 12:45 PM, Dave Young <davey@themill.com> wrote:

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM  |  @MILLCHANNEL  |  FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


   ___       ___       ___       ___       ___       ___       ___       ___   /\  \     /\  \     /\  \     /\__\     /\  \     /\__\     /\  \     /\  \  /::\  \   _\:\  \   /::\  \   /:/ _/_   /::\  \   /:/ _/_   /::\  \   /::\  \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/  / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/  / |::::/  / \:\:\/  / \;:::/  /  \::/  /   \:\__\    \::/  /   |:|  |    \::/  /   L;;/__/   \:\/  /   |:\/__/   \/__/     \/__/     \/__/     \|__|     \/__/    v1.0.4     \/__/     \|__|                                       https://github.com/shirosaidev/diskover
shirosaidev/diskover github.com diskover - Diskover File System Crawler




[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last):   File "diskover.py", line 517, in <module>     main()   File "diskover.py", line 471, in main     indexCreate(ES, INDEXNAME)   File "diskover.py", line 410, in indexCreate     ES.indices.create(index=INDEXNAME, body=mappings)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped     return func(*args, params=params, **kwargs)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create     params=params, body=body)   File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request     status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request     self._raise_error(response.status, raw_data)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200. 
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM  |  @MILLCHANNEL  |  FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **

Ive released version 1.0.4 which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover 

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!

 


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400

0 Plus One's     0 Comments  
   

Response from Dave Young @ May 5, 2017, 1:50 p.m.

i hit send too soon, forgot to mention:


https://github.com/shirosaidev/diskover

under Requirements:


Elasticsearch (local or AWS ES service, tested on Elasticsearch 5.3.0) Kibana (tested on Kibana 5.3.0)



 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: Dave Young
Sent: Friday, May 5, 2017 1:44 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
 

relevant bits:  (find output) /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info (rpm output) elasticsearch-1.7.3-1.noarch
python-elasticsearch-1.9.0-1.el7.noarch
not sure why pip would have installed such an early version of the python client, but i'd suggest cloning it from here:


https://github.com/elastic/elasticsearch-py

and installing matching elasticsearch RPMs (and kibana while you're at it) from here:


https://www.elastic.co/downloads/elasticsearch




From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 1:32 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **
Hey Dave (err mate),
  Would be nice if the client bitched about not being compatible with the server. Here is the output requested, don't see anything with a 5.x version:
root@backup ~]$ find /usr/lib/ -name '*elasticsearch*' | grep python /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyc /usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyo /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.py /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyc /usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.py /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyo /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyc /usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyo /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyc /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.py /usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyo /usr/lib/python2.7/site-packages/elasticsearch /usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info [root@backup ~]$ rpm -qa | grep -i elasticsearch elasticsearch-1.7.3-1.noarch python-elasticsearch-1.9.0-1.el7.noarch
There is a 2.4.5 version of elastic search in the repos.
Greg

On Fri, May 5, 2017 at 12:45 PM, Dave Young <davey@themill.com> wrote:

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM  |  @MILLCHANNEL  |  FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


   ___       ___       ___       ___       ___       ___       ___       ___   /\  \     /\  \     /\  \     /\__\     /\  \     /\__\     /\  \     /\  \  /::\  \   _\:\  \   /::\  \   /:/ _/_   /::\  \   /:/ _/_   /::\  \   /::\  \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/  / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/  / |::::/  / \:\:\/  / \;:::/  /  \::/  /   \:\__\    \::/  /   |:|  |    \::/  /   L;;/__/   \:\/  /   |:\/__/   \/__/     \/__/     \/__/     \|__|     \/__/    v1.0.4     \/__/     \|__|                                       https://github.com/shirosaidev/diskover
shirosaidev/diskover github.com diskover - Diskover File System Crawler




[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last):   File "diskover.py", line 517, in <module>     main()   File "diskover.py", line 471, in main     indexCreate(ES, INDEXNAME)   File "diskover.py", line 410, in indexCreate     ES.indices.create(index=INDEXNAME, body=mappings)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped     return func(*args, params=params, **kwargs)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create     params=params, body=body)   File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request     status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request     self._raise_error(response.status, raw_data)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200. 
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM  |  @MILLCHANNEL  |  FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **

Ive released version 1.0.4 which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover 

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!

 


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400

0 Plus One's     0 Comments  
   

Response from William Sandler @ May 5, 2017, 1:45 p.m.
For CentOS:
See here for installing Elasticsearch 5.x https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 1:32 PM, Greg Dickie <greg@justaguy.ca> wrote:
Hey Dave (err mate),
Would be nice if the client bitched about not being compatible with the server. Here is the output requested, don't see anything with a 5.x version:
root@backup ~]$ find /usr/lib/ -name '*elasticsearch*' | grep python/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.py/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyc/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyc/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyo/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.py/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyo/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.py/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyc/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyo/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.py/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.py/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyc/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyo/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyc/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyo/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyc/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.py/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyo/usr/lib/python2.7/site-packages/elasticsearch/usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info[root@backup ~]$ rpm -qa | grep -i elasticsearchelasticsearch-1.7.3-1.noarchpython-elasticsearch-1.9.0-1.el7.noarch
There is a 2.4.5 version of elastic search in the repos.
Greg

On Fri, May 5, 2017 at 12:45 PM, Dave Young <davey@themill.com> wrote:

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \ /::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ / \::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover
shirosaidev/diskover github.com diskover - Diskover File System Crawler




[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T +12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 5, 2017, 1:35 p.m.
Hey Dave (err mate),
Would be nice if the client bitched about not being compatible with the server. Here is the output requested, don't see anything with a 5.x version:
root@backup ~]$ find /usr/lib/ -name '*elasticsearch*' | grep python/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.py/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyc/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyc/usr/lib/python2.7/site-packages/salt/modules/elasticsearch.pyo/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.py/usr/lib/python2.7/site-packages/salt/modules/boto_elasticsearch_domain.pyo/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.py/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyc/usr/lib/python2.7/site-packages/salt/returners/elasticsearch_return.pyo/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.py/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.py/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyc/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index.pyo/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyc/usr/lib/python2.7/site-packages/salt/states/elasticsearch_index_template.pyo/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyc/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.py/usr/lib/python2.7/site-packages/salt/states/boto_elasticsearch_domain.pyo/usr/lib/python2.7/site-packages/elasticsearch/usr/lib/python2.7/site-packages/elasticsearch-1.9.0-py2.7.egg-info[root@backup ~]$ rpm -qa | grep -i elasticsearchelasticsearch-1.7.3-1.noarchpython-elasticsearch-1.9.0-1.el7.noarch
There is a 2.4.5 version of elastic search in the repos.
Greg

On Fri, May 5, 2017 at 12:45 PM, Dave Young <davey@themill.com> wrote:

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \ /::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ / \::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover
shirosaidev/diskover github.com diskover - Diskover File System Crawler




[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T +12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Dave Young @ May 5, 2017, 12:50 p.m.

mate your python elasticsearch lib and elasticsearch bin versions dont match. send the output of:


find /usr/lib/ -name '*elasticsearch*' | grep python rpm -qa | grep -i elasticsearch

you'll probably find the python elasticsearch version you got from pip is 5.x and your (likely yum installed) bin version is < 5
in addition when you get to the kibana stage you'll likely need to update elasticsearch again so the kibana and elasticsearch versions match so just go to the elastic website and get both
 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Greg Dickie <greg@justaguy.ca>
Sent: Friday, May 5, 2017 11:32 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


   ___       ___       ___       ___       ___       ___       ___       ___   /\  \     /\  \     /\  \     /\__\     /\  \     /\__\     /\  \     /\  \  /::\  \   _\:\  \   /::\  \   /:/ _/_   /::\  \   /:/ _/_   /::\  \   /::\  \ /:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\ \:\/:/  / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/  / |::::/  / \:\:\/  / \;:::/  /  \::/  /   \:\__\    \::/  /   |:|  |    \::/  /   L;;/__/   \:\/  /   |:\/__/   \/__/     \/__/     \/__/     \|__|     \/__/    v1.0.4     \/__/     \|__|                                       https://github.com/shirosaidev/diskover



[2017-05-05 11:28:27] [status] Connecting to Elasticsearch [2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22 [2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last):   File "diskover.py", line 517, in <module>     main()   File "diskover.py", line 471, in main     indexCreate(ES, INDEXNAME)   File "diskover.py", line 410, in indexCreate     ES.indices.create(index=INDEXNAME, body=mappings)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped     return func(*args, params=params, **kwargs)   File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create     params=params, body=body)   File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request     status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request     self._raise_error(response.status, raw_data)   File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200. 
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM  |  @MILLCHANNEL  |  FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **

Ive released version 1.0.4 which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover 

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!

 


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy 514-983-5400

0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 5, 2017, 12:05 p.m.
I did.
and it seems fine to me.
[root@backup diskover master]$ git checkout -- diskover.cfg[root@backup diskover master]$ curl -X POST 'http://localhost:9200/tutorial/helloworld/1' -d '{ "message": "Hello World!" }'{"_index":"tutorial","_type":"helloworld","_id":"1","_version":1,"created":true}[root@backup diskover master]$[root@backup diskover master]$ curl -X GET 'http://localhost:9200/tutorial/helloworld/1'{"_index":"tutorial","_type":"helloworld","_id":"1","_version":1,"found":true,"_source":{ "message": "Hello World!" }}[root@backup diskover master]$[root@backup diskover master]$[root@backup diskover master]$ pip install kibanaRequirement already satisfied (use --upgrade to upgrade): kibana in /usr/lib/python2.7/site-packagesRequirement already satisfied (use --upgrade to upgrade): elasticsearch in /usr/lib/python2.7/site-packages (from kibana)Requirement already satisfied (use --upgrade to upgrade): argparse in /usr/lib/python2.7/site-packages (from kibana)Requirement already satisfied (use --upgrade to upgrade): requests in /usr/lib/python2.7/site-packages (from kibana)Requirement already satisfied (use --upgrade to upgrade): urllib3<2.0,>=1.8 in /usr/lib/python2.7/site-packages (from elasticsearch->kibana)You are using pip version 8.1.2, however version 9.0.1 is available.You should consider upgrading via the 'pip install --upgrade pip' command.[root@backup diskover master]$ pip install elasticsearchRequirement already satisfied (use --upgrade to upgrade): elasticsearch in /usr/lib/python2.7/site-packagesRequirement already satisfied (use --upgrade to upgrade): urllib3<2.0,>=1.8 in /usr/lib/python2.7/site-packages (from elasticsearch)You are using pip version 8.1.2, however version 9.0.1 is available.You should consider upgrading via the 'pip install --upgrade pip' command.
I see the change from 1.0.4 to 1.0.5 had to do with numeric values of owner and group. In this case everything in that directory is owned by root.
Thanks for the help,Greg


On Fri, May 5, 2017 at 11:49 AM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Did you:
pip install elasticsearchpip install kibana

Also,
Can you confirm elasticsearch is running properly by providing the output of:
curl -X POST 'http://localhost:9200/tutorial/helloworld/1' -d '{ "message": "Hello World!" }'

curl -X GET 'http://localhost:9200/tutorial/helloworld/1'


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:39 AM, Greg Dickie <greg@justaguy.ca> wrote:
Did a git pull so I assume I now have the latest and greatest. Same problem. Seems like an issue creating the index. I know nothing about ES unfortunately (not a java fan). Any other ideas?
Thanks,Greg
On Fri, May 5, 2017 at 11:34 AM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Try 1.0.5. I think I had the same issue on 1.0.4.

William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:32 AM, Greg Dickie <greg@justaguy.ca> wrote:
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover


[2017-05-05 11:28:27] [status] Connecting to Elasticsearch[2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22[2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from William Sandler @ May 5, 2017, 11:50 a.m.
Did you:
pip install elasticsearchpip install kibana

Also,
Can you confirm elasticsearch is running properly by providing the output of:
curl -X POST 'http://localhost:9200/tutorial/helloworld/1' -d '{ "message": "Hello World!" }'

curl -X GET 'http://localhost:9200/tutorial/helloworld/1'


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:39 AM, Greg Dickie <greg@justaguy.ca> wrote:
Did a git pull so I assume I now have the latest and greatest. Same problem. Seems like an issue creating the index. I know nothing about ES unfortunately (not a java fan). Any other ideas?
Thanks,Greg
On Fri, May 5, 2017 at 11:34 AM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Try 1.0.5. I think I had the same issue on 1.0.4.

William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:32 AM, Greg Dickie <greg@justaguy.ca> wrote:
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover


[2017-05-05 11:28:27] [status] Connecting to Elasticsearch[2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22[2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 5, 2017, 11:40 a.m.
Did a git pull so I assume I now have the latest and greatest. Same problem. Seems like an issue creating the index. I know nothing about ES unfortunately (not a java fan). Any other ideas?
Thanks,Greg
On Fri, May 5, 2017 at 11:34 AM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Try 1.0.5. I think I had the same issue on 1.0.4.

William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:32 AM, Greg Dickie <greg@justaguy.ca> wrote:
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover


[2017-05-05 11:28:27] [status] Connecting to Elasticsearch[2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22[2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from Greg Dickie @ May 5, 2017, 11:35 a.m.
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover


[2017-05-05 11:28:27] [status] Connecting to Elasticsearch[2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22[2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400

0 Plus One's     0 Comments  
   

Response from William Sandler @ May 5, 2017, 11:35 a.m.
Try 1.0.5. I think I had the same issue on 1.0.4.

William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Fri, May 5, 2017 at 11:32 AM, Greg Dickie <greg@justaguy.ca> wrote:
This is likely a stupid question. Trying to get this rolling on centOS 7 and get:


___ ___ ___ ___ ___ ___ ___ ___ /\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/ \/__/ \/__/ \/__/ \|__| \/__/ v1.0.4 \/__/ \|__| https://github.com/shirosaidev/diskover


[2017-05-05 11:28:27] [status] Connecting to Elasticsearch[2017-05-05 11:28:27] [info] Checking for ES index: diskover-2017.04.22[2017-05-05 11:28:27] [info] Creating ES indexTraceback (most recent call last): File "diskover.py", line 517, in <module> main() File "diskover.py", line 471, in main indexCreate(ES, INDEXNAME) File "diskover.py", line 410, in indexCreate ES.indices.create(index=INDEXNAME, body=mappings) File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped return func(*args, params=params, **kwargs) File "/usr/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create params=params, body=body) File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 93, in perform_request self._raise_error(response.status, raw_data) File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)elasticsearch.exceptions.RequestError: TransportError(400, u'MapperParsingException[mapping [file]]; nested: MapperParsingException[No handler for type [keyword] declared on field [owner]]; ')
I'm thinking it's a config error for elasticsearch. elasticsearch is running on the same machine for now and is listening in port 9200.
Can anyone (everyone) point me in the right direction.
Greg

On Thu, May 4, 2017 at 11:50 AM, Dave Young <davey@themill.com> wrote:

just got this going here, pretty neat. will mess around with it more and report any feedback



DAVEYOUNG
SeniorSystemsEngineer
T+12123373210
451BROADWAY,6THFLOOR,NEWYORK,NY10013
THEMILL.COM|@MILLCHANNEL|FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
** WARNING: This mail is from an external source **

Ive releasedversion 1.0.4which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover

shirosaidev/diskover github.com diskover - Diskover File System Crawler

Thanks to everyone who's been helping with the project!


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--


Greg Dickie
just a guy514-983-5400
To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Dave Young @ May 4, 2017, 11:55 a.m.

just got this going here, pretty neat. will mess around with it more and report any feedback



 
DAVE YOUNG
Senior Systems Engineer
+1 212 337 3210
451 BROADWAY, 6TH FLOOR, NEW YORK, NY 10013
THEMILL.COM | @MILLCHANNEL | FACEBOOK.COM/MILLCHANNEL

From: studiosysadmins-discuss-bounces@studiosysadmins.com <studiosysadmins-discuss-bounces@studiosysadmins.com> on behalf of Chris Park <content@studiosysadmins.com>
Sent: Thursday, May 4, 2017 12:00 AM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler
  ** WARNING: This mail is from an external source **

Ive released version 1.0.4 which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover 


Thanks to everyone who's been helping with the project!

 


0 Plus One's     0 Comments  
   

Response from Chris Park @ May 4, 2017, midnight

I’ve released version 1.0.4 which fixes some bugs and adds new feature for finding duplicate files.

Github link https://github.com/shirosaidev/diskover 

Thanks to everyone who's been helping with the project!

 


0 Plus One's     0 Comments  
   

Response from William Sandler @ May 1, 2017, 9:30 a.m.
At the risk of annoying you all with a third email in a row, I'm going to link to the instructions because I was told it's still not showing up correctly in people's emails.
https://docs.google.com/document/d/1HL9Mh7pepQxTbEA1tVMrSPvDIwHi5c4wH3y6vBXqo-w/edit?usp=sharing

Happy Monday.



William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sun, Apr 30, 2017 at 6:46 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Fixed Images:
sudo -i
apt-get install python2.7
sudo apt-get install default-jre
apt-get install apt-transport-https
echo "debhttps://artifacts.elastic.co/packages/5.x/aptstable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

apt-get update && apt-get install elasticsearch
systemctl enable elasticsearch.service
apt-get update && apt-get install kibana
systemctl enable kibana.service
nano /etc/kibana/kibana.yml
Change #server.host: "localhost" to server.host "YourServer'sName"
apt-get install python-pip
pip install elasticsearch
pip install kibana
sudo systemctl restart elasticsearch
sudo systemctl restart kibana
mkdir ~/diskover
cd ~/diskover
git clonehttps://github.com/shirosaidev/diskover.git
cd diskover
python diskover.py -m 0
(This will create the initial Diskover index in Elastisearch, it's just a test run).
In a web browser navigate to YourKibanaServer:5601
You will be brought to a page that looks like this:
Inline image 1




Change "logstash-*" to "diskover-*" without the quotes and uncheck "Index contains time-based events".
You'll be brought to the page below. Hit the orange refresh button.You may have to hit the refresh button againafter running your first indexing that actually indexes files.
Inline image 1
Now navigate the the Kibana webpage again and click Management > Index Patterns > Diskover > File Size > Format > and change it to bytes.
Go back to command line and:
cd /the/path/you/want/to/index/
python ~/diskover/diskover.py
Navigate to the Dashboard Page on your Kibana webpage and see the visual results.


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sun, Apr 30, 2017 at 6:30 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Thanks for releasing this program, it's great for whitebox storage as M Oliver said. If I may make a feature request, it would be great if it listed potential identical files based on filename and file size and then maybe output the total "wasted" space of the identical files.

I put the following guide together for people who have no Elastisearch or Kibana experience but have a basic grasp of Ubuntu command line.
This guide assumes a fresh install of Ubuntu 16.04 Server.
sudo -i
apt-get install python2.7
sudo apt-get install default-jre
apt-get install apt-transport-https
echo "debhttps://artifacts.elastic.co/packages/5.x/aptstable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

apt-get update && apt-get install elasticsearch
systemctl enable elasticsearch.service
apt-get update && apt-get install kibana
systemctl enable kibana.service
nano /etc/kibana/kibana.yml
Change #server.host: "localhost" to server.host "YourServer'sName"
apt-get install python-pip
pip install elasticsearch
pip install kibana
sudo systemctl restart elasticsearch
sudo systemctl restart kibana
mkdir ~/diskover
cd ~/diskover
git clonehttps://github.com/shirosaidev/diskover.git
cd diskover
python diskover.py -m 0
(This will create the initial Diskover index in Elastisearch, it's just a test run).
In a web browser navigate to YourKibanaServer:5601
You will be brought to a page that looks like this:
Inline image 1




Change "logstash-*" to "diskover-*" without the quotes and uncheck "Index contains time-based events".
You'll be brought to the page below. Hit the orange refresh button.You may have to hit the refresh button againafter running your first indexing that actually indexes files.
Inline image 2

Now navigate the the Kibana webpage again and click Management > Index Patterns > Diskover > File Size > Format > and change it to bytes.
Go back to command line and:
cd /the/path/you/want/to/index/
python ~/diskover/diskover.py
Navigate to the Dashboard Page on your Kibana webpage and see the visual results.


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sat, Apr 29, 2017 at 8:10 PM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for everyone whos showed interest in Diskover FileSystem Crawler and whos been helping with testing. Ive released version 1.0.3 which fixes some bugs that you reported and adds some new features. For anyone interested in trying it out now that its a bit more stable, here is the github link https://github.com/shirosaidev/diskover .


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe




0 Plus One's     1 Comments  
   

Response from William Sandler @ April 30, 2017, 6:50 p.m.
Fixed Images:
sudo -i
apt-get install python2.7
sudo apt-get install default-jre
apt-get install apt-transport-https
echo "debhttps://artifacts.elastic.co/packages/5.x/aptstable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

apt-get update && apt-get install elasticsearch
systemctl enable elasticsearch.service
apt-get update && apt-get install kibana
systemctl enable kibana.service
nano /etc/kibana/kibana.yml
Change #server.host: "localhost" to server.host "YourServer'sName"
apt-get install python-pip
pip install elasticsearch
pip install kibana
sudo systemctl restart elasticsearch
sudo systemctl restart kibana
mkdir ~/diskover
cd ~/diskover
git clonehttps://github.com/shirosaidev/diskover.git
cd diskover
python diskover.py -m 0
(This will create the initial Diskover index in Elastisearch, it's just a test run).
In a web browser navigate to YourKibanaServer:5601
You will be brought to a page that looks like this:
Inline image 1




Change "logstash-*" to "diskover-*" without the quotes and uncheck "Index contains time-based events".
You'll be brought to the page below. Hit the orange refresh button.You may have to hit the refresh button againafter running your first indexing that actually indexes files.
Inline image 1
Now navigate the the Kibana webpage again and click Management > Index Patterns > Diskover > File Size > Format > and change it to bytes.
Go back to command line and:
cd /the/path/you/want/to/index/
python ~/diskover/diskover.py
Navigate to the Dashboard Page on your Kibana webpage and see the visual results.


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sun, Apr 30, 2017 at 6:30 PM, William Sandler <william.sandler@allthingsmedia.com> wrote:
Thanks for releasing this program, it's great for whitebox storage as M Oliver said. If I may make a feature request, it would be great if it listed potential identical files based on filename and file size and then maybe output the total "wasted" space of the identical files.

I put the following guide together for people who have no Elastisearch or Kibana experience but have a basic grasp of Ubuntu command line.
This guide assumes a fresh install of Ubuntu 16.04 Server.
sudo -i
apt-get install python2.7
sudo apt-get install default-jre
apt-get install apt-transport-https
echo "debhttps://artifacts.elastic.co/packages/5.x/aptstable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

apt-get update && apt-get install elasticsearch
systemctl enable elasticsearch.service
apt-get update && apt-get install kibana
systemctl enable kibana.service
nano /etc/kibana/kibana.yml
Change #server.host: "localhost" to server.host "YourServer'sName"
apt-get install python-pip
pip install elasticsearch
pip install kibana
sudo systemctl restart elasticsearch
sudo systemctl restart kibana
mkdir ~/diskover
cd ~/diskover
git clonehttps://github.com/shirosaidev/diskover.git
cd diskover
python diskover.py -m 0
(This will create the initial Diskover index in Elastisearch, it's just a test run).
In a web browser navigate to YourKibanaServer:5601
You will be brought to a page that looks like this:
Inline image 1




Change "logstash-*" to "diskover-*" without the quotes and uncheck "Index contains time-based events".
You'll be brought to the page below. Hit the orange refresh button.You may have to hit the refresh button againafter running your first indexing that actually indexes files.
Inline image 2

Now navigate the the Kibana webpage again and click Management > Index Patterns > Diskover > File Size > Format > and change it to bytes.
Go back to command line and:
cd /the/path/you/want/to/index/
python ~/diskover/diskover.py
Navigate to the Dashboard Page on your Kibana webpage and see the visual results.


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sat, Apr 29, 2017 at 8:10 PM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for everyone whos showed interest in Diskover FileSystem Crawler and whos been helping with testing. Ive released version 1.0.3 which fixes some bugs that you reported and adds some new features. For anyone interested in trying it out now that its a bit more stable, here is the github link https://github.com/shirosaidev/diskover .


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



0 Plus One's     0 Comments  
   

Response from William Sandler @ April 30, 2017, 6:35 p.m.
Thanks for releasing this program, it's great for whitebox storage as M Oliver said. If I may make a feature request, it would be great if it listed potential identical files based on filename and file size and then maybe output the total "wasted" space of the identical files.

I put the following guide together for people who have no Elastisearch or Kibana experience but have a basic grasp of Ubuntu command line.
This guide assumes a fresh install of Ubuntu 16.04 Server.
sudo -i
apt-get install python2.7
sudo apt-get install default-jre
apt-get install apt-transport-https
echo "debhttps://artifacts.elastic.co/packages/5.x/aptstable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

apt-get update && apt-get install elasticsearch
systemctl enable elasticsearch.service
apt-get update && apt-get install kibana
systemctl enable kibana.service
nano /etc/kibana/kibana.yml
Change #server.host: "localhost" to server.host "YourServer'sName"
apt-get install python-pip
pip install elasticsearch
pip install kibana
sudo systemctl restart elasticsearch
sudo systemctl restart kibana
mkdir ~/diskover
cd ~/diskover
git clonehttps://github.com/shirosaidev/diskover.git
cd diskover
python diskover.py -m 0
(This will create the initial Diskover index in Elastisearch, it's just a test run).
In a web browser navigate to YourKibanaServer:5601
You will be brought to a page that looks like this:
Inline image 1




Change "logstash-*" to "diskover-*" without the quotes and uncheck "Index contains time-based events".
You'll be brought to the page below. Hit the orange refresh button.You may have to hit the refresh button againafter running your first indexing that actually indexes files.
Inline image 2

Now navigate the the Kibana webpage again and click Management > Index Patterns > Diskover > File Size > Format > and change it to bytes.
Go back to command line and:
cd /the/path/you/want/to/index/
python ~/diskover/diskover.py
Navigate to the Dashboard Page on your Kibana webpage and see the visual results.


William SandlerAll Things Media, LLCOffice:201.818.1999 Ex 158.william.sandler@allthingsmedia.com
On Sat, Apr 29, 2017 at 8:10 PM, Chris Park <content@studiosysadmins.com> wrote:

Thanks for everyone whos showed interest in Diskover FileSystem Crawler and whos been helping with testing. Ive released version 1.0.3 which fixes some bugs that you reported and adds some new features. For anyone interested in trying it out now that its a bit more stable, here is the github link https://github.com/shirosaidev/diskover .


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Chris Park @ April 29, 2017, 8:09 p.m.

Thanks for everyone who’s showed interest in Diskover FileSystem Crawler and who’s been helping with testing. I’ve released version 1.0.3 which fixes some bugs that you reported and adds some new features. For anyone interested in trying it out now that it’s a bit more stable, here is the github link https://github.com/shirosaidev/diskover .


0 Plus One's     0 Comments  
   

Response from Michael Oliver @ April 26, 2017, 7:10 p.m.
Great project Chris. Going to play with this for some of our white box storage.
There is a commercial product that is pretty sweet and works with distributed systems. ClarityNow. Super fast but it does cost $$
Fyi those with isilon, InsightIQ has been free for a while. Just need to request the license.
Michael Oliver
mcoliver@gmail.com
858.336.1438
On Apr 25, 2017 7:34 PM, "Chris Park" <content@studiosysadmins.com> wrote:

Thanks for all the interest in Diskover. After you've had a chance to test it out please post any feedback/bugs/feature requests on the git issues page (link below).

Remember that it is early in beta so you may find it buggy. I am doing my own testing on it and will be updating frequently so please update to the latest version before posting any bugs.

https://github.com/shirosaidev/diskover/issues

Thanks everyone and I look forward to more people testing.


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     1 Comments  
   

Response from Chris Park @ April 25, 2017, 10:33 p.m.

Thanks for all the interest in Diskover. After you've had a chance to test it out please post any feedback/bugs/feature requests on the git issues page (link below).

Remember that it is early in beta so you may find it buggy. I am doing my own testing on it and will be updating frequently so please update to the latest version before posting any bugs.

https://github.com/shirosaidev/diskover/issues

Thanks everyone and I look forward to more people testing.


0 Plus One's     0 Comments  
   

Response from Zorion Terrell @ April 25, 2017, 8:35 p.m.

Our Pipeline team had created a crummy disk crawler to do this but its crummy.

Id very much be interested in checking it out!!

 

 

Zorion Terrell

IT Manager | DHX Studios

e: zorion.terrell@dhxmedia.com

t: 604-684-2363 | m: 604-562-5148

380 West 5th Ave

Vancouver, BC Canada V5Y 1J5

 

Email Signature_DHX_Media

 

 

From: studiosysadmins-discuss-bounces@studiosysadmins.com [mailto:studiosysadmins-discuss-bounces@studiosysadmins.com] On Behalf Of Chris Park
Sent: Monday, April 24, 2017 9:41 PM
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] Diskover file system crawler

 

What started me on working on Diskover fs crawler was to help studios to understand their unstructured data better. At all the studios I worked at we never had a clear view of our data and our storage was always getting full. I know that some very recent storage companies are finally giving insights into the data (Qumulo), but a lot of studios still out there don't really know about their "dark data".

We all know the trend of throwing money at the problem by purchasing more and more expensive storage. It's usually the only choice since a lot of production teams are in a "delete nothing" mentality, even after a project wraps. I think a lof of this is from a lack of insights into what is on the storage. I've seen many times cache files, duplicates and compressed archives sitting on storage and wasting space.

If your studio has built an intelligent enough pipeline where asset file metadata is being properly recorded then that is awesome and maybe your storage is not at that all to familiar 95% mark.

Diskover is not designed to automate deletion or movement of data, it helps you quickly crawl your data for analysis in Elasticsearch/Kibana and give you better understanding of what files are old and probably not being used any more. From here you can decide what data needs to be deleted or moved to archive.

 


0 Plus One's     1 Comments  
   

Response from Greg Dickie @ April 25, 2017, 8:35 a.m.
Sounds like a much needed tool. We'll try it out. 
Thanks,Greg 

--Greg Dickiejust a guy
On Apr 25, 2017, at 00:40, Chris Park <content@studiosysadmins.com> wrote:

What started me on working on Diskover fs crawler was to help studios to understand their unstructured data better. At all the studios I worked at we never had a clear view of our data and our storage was always getting full. I know that some very recent storage companies are finally giving insights into the data (Qumulo), but a lot of studios still out there don't really know about their "dark data".

We all know the trend of throwing money at the problem by purchasing more and more expensive storage. It's usually the only choice since a lot of production teams are in a "delete nothing" mentality, even after a project wraps. I think a lof of this is from a lack of insights into what is on the storage. I've seen many times cache files, duplicates and compressed archives sitting on storage and wasting space.

If your studio has built an intelligent enough pipeline where asset file metadata is being properly recorded then that is awesome and maybe your storage is not at that all to familiar 95% mark.

Diskover is not designed to automate deletion or movement of data, it helps you quickly crawl your data for analysis in Elasticsearch/Kibana and give you better understanding of what files are old and probably not being used any more. From here you can decide what data needs to be deleted or moved to archive.

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     1 Comments  
   

Response from Chris Park @ April 25, 2017, 12:40 a.m.

What started me on working on Diskover fs crawler was to help studios to understand their unstructured data better. At all the studios I worked at we never had a clear view of our data and our storage was always getting full. I know that some very recent storage companies are finally giving insights into the data (Qumulo), but a lot of studios still out there don't really know about their "dark data".

We all know the trend of throwing money at the problem by purchasing more and more expensive storage. It's usually the only choice since a lot of production teams are in a "delete nothing" mentality, even after a project wraps. I think a lof of this is from a lack of insights into what is on the storage. I've seen many times cache files, duplicates and compressed archives sitting on storage and wasting space.

If your studio has built an intelligent enough pipeline where asset file metadata is being properly recorded then that is awesome and maybe your storage is not at that all to familiar 95% mark.

Diskover is not designed to automate deletion or movement of data, it helps you quickly crawl your data for analysis in Elasticsearch/Kibana and give you better understanding of what files are old and probably not being used any more. From here you can decide what data needs to be deleted or moved to archive.


0 Plus One's     0 Comments  
   

Response from Brandon Lindauer @ April 24, 2017, 5:50 p.m.

I don't know about using the wrong storage. I'd say that storage with good analytics wasn't really a thing until relatively recently. If your storage doesn't give you that information then either 1) you have more requirements, or 2) the "right" storage is out of your budget. 
And a crawler should be read-only or mount the volume read-only. 
My question is can it aggregate multiple file systems? If so I'd say it's a short junior to start looking at which files exist in multiple locations. 
On Apr 24, 2017, at 12:37 AM, julian firminger <justdigitalfilm@gmail.com> wrote:

Hi Chris,
What do you see as the use case(s) from a studio specific perspective?  As a general rule, if you need some sort of crawler for storage that is housing assets (of all flavors) then you're a) storing it wrong to begin with and b) going to get a nasty shock when some bit of automation goes and deletes or moves things that are some how tacitly still in production.  
We have a need to be able to dynamically understand who's data is where for a variety of reasons, will this be able to compile real time usage and volume migration data (files based)?

Julian Firminger

Snr. Systems Administrator, - Attempted Full-Stack EngineerUnited Broadcast FacilitiesAmsterdam, The Netherlands

On Sat, Apr 22, 2017 at 3:35 AM, Chris Park <content@studiosysadmins.com> wrote:

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git link https://github.com/shirosaidev/diskover .

Email me cpark16@gmail.com with any feedback/bugs.

Thanks


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     1 Comments  
   

Response from Ali Poursamadi @ April 24, 2017, 3:40 p.m.
Qumulo does provide relatively updated ( 1-2 minutes old ) information regarding disk usage of every directory in the file-system. You can navigate the tree and see disk usage in a visual web interface or you can get the data in json format through the api. Json reports contain other "aggregates" as well in this format :{ { "files": [ { "num_symlinks": "163", "name": "_PRD", "num_other_objects": "0", "data_usage": "98263399424000", "num_files": "5524981", "capacity_usage": "98289241350144", "num_directories": "314405", "meta_usage": "25841926144", "type": "FS_FILE_TYPE_DIRECTORY", "id": "5414002688" }, { "num_symlinks": "0", "name": "test", "num_other_objects": "0", "data_usage": "953154867200", "num_files": "8800", "capacity_usage": "953233731584", "num_directories": "28", "meta_usage": "78864384", "type": "FS_FILE_TYPE_DIRECTORY", "id": "13697004036" }, { "num_symlinks": "0", "name": "_SYS", "num_other_objects": "0", "data_usage": "3787313152", "num_files": "1250", "capacity_usage": "3793309696", "num_directories": "175", "meta_usage": "5996544", "type": "FS_FILE_TYPE_DIRECTORY", "id": "5447002688" } ], "total_files": "5535033", "total_other_objects": "0", "total_capacity": "99247707312128", "total_symlinks": "163", "total_data": "99221780467712", "path": "/", "total_directories": "314608", "total_meta": "25926844416", "id": "2" }
-Ali-Ali

On Mon, Apr 24, 2017 at 7:08 AM, Victor Olmedo <vaolmedo@gmail.com> wrote:
has anyone looked at Qumulo for this?
On Mon, Apr 24, 2017 at 10:05 AM, Mathieu Arseneault <mathieu@tistik.com> wrote:
Hi Chris,
Through Kibana, are we able to drill down through the directory tree ?If so you've got a tester here...
ThanksMathieu
On Mon, Apr 24, 2017 at 3:37 AM, julian firminger <justdigitalfilm@gmail.com> wrote:
Hi Chris,
What do you see as the use case(s) from a studio specific perspective? As a general rule, if you need some sort of crawler for storage that is housing assets (of all flavors) then you're a) storing it wrong to begin with and b) going to get a nasty shock when some bit of automation goes and deletes or moves things that are some how tacitly still in production.
We have a need to be able to dynamically understand who's data is where for a variety of reasons, will this be able to compile real time usage and volume migration data (files based)?

Julian Firminger

Snr. Systems Administrator, - Attempted Full-Stack EngineerUnited Broadcast FacilitiesAmsterdam, The Netherlands

On Sat, Apr 22, 2017 at 3:35 AM, Chris Park <content@studiosysadmins.com> wrote:

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git linkhttps://github.com/shirosaidev/diskover.

Email mecpark16@gmail.comwith any feedback/bugs.

Thanks


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
Victor A Olmedo

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Mathieu Arseneault @ April 24, 2017, 10:10 a.m.
Hi Chris,
Through Kibana, are we able to drill down through the directory tree ?If so you've got a tester here...
ThanksMathieu
On Mon, Apr 24, 2017 at 3:37 AM, julian firminger <justdigitalfilm@gmail.com> wrote:
Hi Chris,
What do you see as the use case(s) from a studio specific perspective? As a general rule, if you need some sort of crawler for storage that is housing assets (of all flavors) then you're a) storing it wrong to begin with and b) going to get a nasty shock when some bit of automation goes and deletes or moves things that are some how tacitly still in production.
We have a need to be able to dynamically understand who's data is where for a variety of reasons, will this be able to compile real time usage and volume migration data (files based)?

Julian Firminger

Snr. Systems Administrator, - Attempted Full-Stack EngineerUnited Broadcast FacilitiesAmsterdam, The Netherlands

On Sat, Apr 22, 2017 at 3:35 AM, Chris Park <content@studiosysadmins.com> wrote:

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git linkhttps://github.com/shirosaidev/diskover.

Email mecpark16@gmail.comwith any feedback/bugs.

Thanks


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     1 Comments  
   

Response from Victor Olmedo @ April 24, 2017, 10:10 a.m.
has anyone looked at Qumulo for this?
On Mon, Apr 24, 2017 at 10:05 AM, Mathieu Arseneault <mathieu@tistik.com> wrote:
Hi Chris,
Through Kibana, are we able to drill down through the directory tree ?If so you've got a tester here...
ThanksMathieu
On Mon, Apr 24, 2017 at 3:37 AM, julian firminger <justdigitalfilm@gmail.com> wrote:
Hi Chris,
What do you see as the use case(s) from a studio specific perspective? As a general rule, if you need some sort of crawler for storage that is housing assets (of all flavors) then you're a) storing it wrong to begin with and b) going to get a nasty shock when some bit of automation goes and deletes or moves things that are some how tacitly still in production.
We have a need to be able to dynamically understand who's data is where for a variety of reasons, will this be able to compile real time usage and volume migration data (files based)?

Julian Firminger

Snr. Systems Administrator, - Attempted Full-Stack EngineerUnited Broadcast FacilitiesAmsterdam, The Netherlands

On Sat, Apr 22, 2017 at 3:35 AM, Chris Park <content@studiosysadmins.com> wrote:

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git linkhttps://github.com/shirosaidev/diskover.

Email mecpark16@gmail.comwith any feedback/bugs.

Thanks


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



--
Victor A Olmedo

0 Plus One's     0 Comments  
   

Response from Julian Firminger @ April 24, 2017, 3:40 a.m.
Hi Chris,
What do you see as the use case(s) from a studio specific perspective? As a general rule, if you need some sort of crawler for storage that is housing assets (of all flavors) then you're a) storing it wrong to begin with and b) going to get a nasty shock when some bit of automation goes and deletes or moves things that are some how tacitly still in production.
We have a need to be able to dynamically understand who's data is where for a variety of reasons, will this be able to compile real time usage and volume migration data (files based)?

Julian Firminger

Snr. Systems Administrator, - Attempted Full-Stack EngineerUnited Broadcast FacilitiesAmsterdam, The Netherlands

On Sat, Apr 22, 2017 at 3:35 AM, Chris Park <content@studiosysadmins.com> wrote:

I'm developing a file system crawler to help our industry. It uses Elasticsearch and Kibana. I'm looking for beta testers to help me with the project. If you are interested here is the git linkhttps://github.com/shirosaidev/diskover.

Email mecpark16@gmail.comwith any feedback/bugs.

Thanks


To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     1 Comments