forked from evkortright/pagina-de-busqueda
-
Notifications
You must be signed in to change notification settings - Fork 0
/
blogs_tmp.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 17 in line 1.
101 lines (101 loc) · 346 KB
/
blogs_tmp.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
title;seo_title;url;author;publish_date;category;locales;content
Elastic Cloud and Meltdown;;/blog/elastic-cloud-and-meltdown;Elastic Engineering;January 08, 2018;Engineering;; Elastic is aware of the and we are addressing them for Elastic Cloud. We know that you entrust your data to our cloud service, and we take the confidentiality of all data very seriously. At this time, we are not aware of any exploit on our cloud service that utilized the Meltdown or Spectre vulnerabilities. Impact Assessment The Meltdown or Spectre vulnerabilities apply when untrusted code can execute on a system. At the host infrastructure level, we know that both our infrastructure providers (AWS and GCP) have patched their systems, and are no longer vulnerable. At the Elastic Cloud service level, Elastic Cloud allows you to upload some artifacts, such as plug-ins, dictionaries, and scripts. These uploads provide a potential vector of attack that could exploit Meltdown. Old Elasticsearch clusters on version 1.x are more vulnerable. Except uploads from you, Elastic Cloud does not allow for untrusted code execution. Based on our assessment, we believe the impact of Meltdown and Spectre to Elastic Cloud to be small. We have focused our efforts on mitigation and control while we carry out our regular process for operating system patches in an accelerated fashion. Mitigation We disabled non-sandboxed scripting for all Elasticsearch 1.x clusters as a primary, customer-visible mitigation. We have also disabled self-service uploads of custom bundles from you until we have fully completed our patching. Behind the scenes, we’ve further increased our observability of system-level calls and isolated clusters running version 1.x of Elasticsearch on their own hosts. Patching and Customer Impact We are using an accelerated version of our regular maintenance procedure to perform OS-level updates while maintaining service availability for user clusters. Your clusters will not experience downtime from this operating system patching. There is considerable speculation on the internet about the performance impact of the patches to address Meltdown. We have begun Elasticsearch-specific testing to establish the impact for our common benchmarking scenarios and will publish a blog post with this information as soon as we have it. Further Assistance As always, our support team is available to answer questions you have regarding Meltdown, Spectre, and our handling of them. Please use your standard support channel to ask those questions.
Brewing in Beats: Password Keystore;;/blog/brewing-in-beats-password-keystore;Monica Sarbu;January 05, 2018;Brewing in Beats;; Did you know that is already available? Try it and let us know what you think. If you are curious to see the Beats in action, we just published the . This update includes the changes over the last two weeks. Password keystoreWe have merged the which allow users to define sensitive information into an obfuscated data store on disk instead of having them defined in plaintext in the yaml configuration.# create new keystore to disk ./metricbeat keystore create # add a new key to the store. ./metricbeat keystore add elasticsearch_password # remove a key from the store ./metricbeat keystore remove elasticsearch_password # list the configured keys without the sensitive information ./metricbeat keystore list You can then reference the keys from the keystore using the same syntax that we use for the environment variables: password: ${elasticserarch_password} In the current implementation, the passwords are not encrypted into the keystore, only obfuscated. This new feature is planned to be released with the 6.2 release. Structured logging in libbeatThis refactors the logging of libbeat and adds support for structured logging. The new logging implementation is based on , which is one of the most efficient structured logging libraries for Golang. To switch to the JSON format, simply add to the configuration file. Another enhancement is that the Beats can also . By setting , all logs will be written to the Application log. The source name will be the name of the Beat. Besides this, there are no changes to the user facing logging configuration. The non-JSON logger output has some format differences, but, in general, it will have a more consistent format across outputs. These changes are only in the master branch at the moment, but we will likely include it in 6.2. Metricbeat: Read HAProxy metrics over HTTPThanks to , the HAProxy module can in addition to the TCP socket. This means HTTP authentication is also supported when reading the stats. The improvement will be available in the 6.2 release. Other changes:Repository: elastic/beatsAffecting all BeatsChanges in master: MetricbeatChanges in 6.1: PacketbeatChanges in master: AuditbeatChanges in master: TestingChanges in master: Changes in 6.1: Changes in 6.0: InfrastructureChanges in master: DocumentationChanges in 6.1:
Logstash Lines: Update for January 2, 2018;;/blog/logstash-lines-2018-01-02;Andrew Cholakian;January 02, 2018;The Logstash Lines;; We're back from the holidays with some great new features!In 6.0 we released the Logstash Pipeline Viewer. To get to this view, users would first have to go through the Pipelines view. This view listed the user's pipelines as cards with each card listing out the pipeline's versions.We are targeting 6.2 for a redesigned tPipelines view to not only serve as a navigational tool to the Pipeline Viewer but also to provide users with a summary view of their pipelines' health.Many thanks to several Elasticians who volunteered their time to help usability test designs for the new Pipelines view.With users will be able to specify the pipeline ID from the CLI from 6.2.0 onward with the new --pipeline.id option. This is useful for those not using the pipelines.yml file. More consistent handling of IP addr lookups (Targetting LS 7.0) in TCP input as of Better handling of empty source fields in geoip plugin as of We have a tool that can generate a list of LS deps + their licenses for Legal compliance reasons.
This Year in Elasticsearch and Apache Lucene - 2017;;/blog/this-year-in-elasticsearch-and-apache-lucene-2017;Adrien Grand;January 01, 2018;Engineering;; As the Earth's rotation reaches the point where we close out another Gregorian calendar year, we wanted to share one last week in Lucene.Lucene is the core component that Elasticsearch is built on, we've seen some users that may have not even known about Lucene without Elasticsearch, and there are that mention that Lucene bugs were found via Elasticsearch. The work we do on both projects is a commitment that we take great pride in.Here is a non-exhaustive list, in no particular order, of improvements that we made to Lucene over the course of 2017 that we hope you find interesting.And, as a bit of history, here is Shay Banon's , way back in 2006.We're really looking forward to growing this list of Elastic contributors in 2018.
Kibana: This week in Kibana for December 26, 2017;;/blog/keeping-up-with-kibana-2017-12-26;Jim Goodwin;December 26, 2017;Kurrently in Kibana;; Welcome to This is a weekly series of posts on new developments in the Kibana project and any related learning resources and events. Let's talk breaking changes. Wondering how we migrated to be compatible with 6.0? Tyler walks through how the team navigated the removal of mapping types. — elastic (@elastic) Hi all, Well, this is the last post in this series for 2017, and it has been an incredibly productive year for Kibana. Thank you for all of the enhancement requests, PRs, error reports, AMA questions, and forum posts, they help us to make Kibana better for everyone. I won't reiterate all of the changes from 2017, but if you haven't upgraded Kibana in a while you should really read these release posts below, there is a lot of value there and we have a packed road map for 2018.That's all for this week, have a great end of 2017, and a happy new year!Cheers, Jim
The Elastic Advent Calendar 2017, Week 4;;/blog/elastic-advent-calendar-2017-week-four;Aaron Aldrich;December 26, 2017;Engineering;; As we mentioned in our , the Engineering team here at Elastic wanted to celebrate the end of the 2017 Calendar via our own tech-advent series. We took a lot of inspiration from both the (fully in Japanese) and (in English) and we’d like to thank them for providing the awesome quality we have aspired to maintain. We have summarised weeks , and in previous blog posts, and this post covers the last and final (all be it short) week and also provides a summary of all the topics that were posted in the series. Here’s all 25 topics: Dec 1: by Mark Walkom Dec 2: by Jun Ohtani Dec 3: by Tyler Hannan Dec 4: by David Pilato Dec 5: by Tal Levy Dec 6: by Jongmin Kim Dec 7: by Christopher Wurm Dec 8: by Medcl Zeng Dec 9: by Jordan Sissel Dec 10 by Philipp Krenn Dec 11: by Thiago Souza Dec 12: by Atonio Bonuccelli Dec 13: by Aaron Aldrich Dec 14: by Jun Ohtani Dec 15: by Abdon Pijpelink Dec 16: by David Pilato Dec 17: by Mat Schaffer Dec 18: by Jongmin Kim Dec 19: by Tyler Langlois Dec 20: by Medcl Zeng Dec 21: by Sherry Ger Dec 22: by Thiago Souza Dec 23: by Bhavya Mandya Dec 24: by Philipp Krenn Dec 25: by Mark Walkom Thank You! We will be keeping all the of the topics available on the so you can refer back to them at any time. And, as these are Discuss topics, you can also continue the conversation with the authors! Thanks for following on through this series, we hope it’s provided some useful inspiration for your use of the Elastic Stack. If you’d like us to repeat this, or if you have ideas for next year, please let us know via or feel free to create a topic in our with your comments. We hope 2017 has been a great year and we look forward to 2018 being even better!
Solving the Small but Important Issues with Fix-It Fridays;;/blog/solving-the-small-but-important-issues-with-fix-it-fridays;Daniel Cecil;December 22, 2017;Culture;; Question: How many engineers does it take to change a light bulb? Answer: The light bulb works fine on the system in my office... OK. It isn’t a great joke. But it’s the perfect setup for discussing an important topic here at Elastic: How do busy engineers, often working on large and gnarly projects, handle the small issues — like changing a metaphorical light bulb — that inevitably pop up from time to time? The answer: Fix-It Friday. The Elasticsearch code is housed in a public repository and accessible to anyone. When a user finds bugs, spots missing features, or wants to make a specific request, they can flag it using the issues tab by simply submitting a new issue. The process is open and transparent — just the way we like it. Each day, someone on the Elasticsearch team is assigned to a role called support dev help. In this role, the engineer has the dual duty of aiding the Elastic support team while looking for fresh issues in the Elasticsearch repository. When a new issue arises, the engineer will add a label to help the team prioritize when to tackle it, and how much effort it might take to solve it. However, not all issues have a simple diagnosis, nor an easy fix. “If there’s enough information, but it’s not clear that the issue is something we really want to handle due to policy, or maybe the person handling the ticket doesn’t have enough knowledge in the issue area to make a decision on it, then we can mark the ticket ‘discuss’ and it goes into the queue for Fix-It Friday,” said Colin Goodheart-Smithe, Elasticsearch Software Engineer. Elasticsearch Team Lead Clint Gormley created the Fix-It Friday initiative a little over three years ago as a time when these small issues were given to engineers to solve. That ambitious concept didn’t last very long. The team quickly learned that small issues often turned out to be big ones in disguise. (Think: the filament in the light bulb looks dead, but in reality the electricity is out.) So, the scope of Fix-It Friday evolved into a get together for discussing user requests and finding solutions. Since the Elastic team is distributed, the meetup also became a weekly opportunity to get off Slack and email and get focused on a team video call. “It’s a good time,” said Gormley, “getting a group with such a wide range of expertise in one virtual room — it’s amazing.” About 10 issues are discussed during a typical one hour Fix-It Friday session. Issues are later fixed and implemented or de-escalated. When asked whether there was a particular issue from a Fix-It Friday meeting that jumped out at him, or that he thought was quirky or fun, Gormley laughed. “We’ve only been through 12,000 issues or something ….” But one seemingly small bug hiding something larger did spring to mind. Users reported heavy queries submitted to Elasticsearch never timing out, and Gormley recalled queries which ran for hours. “Usually, our queries run milliseconds, so if one runs for an hour, you know you have a problem,” he explained. In these situations users, thinking nothing is happening, run the query again. So, instead of one running for an hour, they actually have two — or more. This isn’t exactly an issue that could break anything, but it had the potential to slow results and reduce resources. The issue was marked for discussion at a Fix-It Friday session. After a lengthy debate, Elastic engineers considered adding a default timeout, meaning in one hour’s time, the query got canceled. It seemed like a good idea at first. But with several eyeballs on the issue, another perspective developed. Data is stored in indexes mapped out to shards, which are situated on different machines. When you run a query, it reaches out to all the shards, gathering the results and providing those results to the user. But what happens if one of the shards is missing due to a dying node on the shard, or when it gets disconnected from the network, causing the heavy query to fail? Should Elasticsearch show an exception? Or show only the results from the available shards? Users performing a simple search might be happy with getting results only from available shards. But users performing analytics would want to know that they’re receiving partial results. For the timeout option, Elastic engineers decided that a silent timeout (when you do not get a notification that the query stopped running) was out of the question. They also considered throwing an exception so that the user knew something was wrong with the query. But what of other circumstances, such as a missing shard, that can create partial results? Should that throw a hard exception too? In the end, they decided to add a global and per-request setting to toggle this behavior. The timeout discussion turned out to be too large a decision for one engineer to make on their own. “From a user perspective it’s important that we actually look at these things,” said Gormley. “Our users are very involved. If they’ve taken the time to write a decent issue, we owe it to them to respond appropriately.” This is where the value of Fix-It Friday really comes into play — it’s a broadening of the collective Elastic mind. For engineers, Fix-It Friday is a chance to break from the day-to-day and think about new issues in different ways, providing an opportunity to meditate on an problem that may not be their particular focus but is part of the larger product. In the end, Fix-It Friday isn’t about simply fixing bugs, or fielding requests — it’s about widening the scope of what Elastic can do. “It's about making decisions,” said Elasticsearch Software Engineer Adrien Grand. “It’s about which direction we want to take.” “You see people asking us to add features that work on small datasets but won’t scale,” said Gormley. “If we make something as a small-scale solution, inevitably someone will want to use it on the big scale and it will fail. That kind of stuff is important for new devs to know so that they can make these decisions later on. There’s an ethos to how we develop: guiding principles of what to add, and what not to add.” However, Gormley added, nothing is set in stone. “That willingness to change minds is an important part of the Elastic culture.” “As usual in open source,” added Adrien Grand, “no is temporary, but yes is forever.”
iPrice Group & the largest e-commerce catalog in Southeast Asia – powered by Elasticsearch;;/blog/iprice-group-and-the-largest-ecommerce-catalog-in-southeast-asia-powered-by-elasticsearch;Heinrich Wendel;December 22, 2017;User Stories;; E-commerce is a very young and fragmented space in Southeast Asia (SEA). Unlike the United States, where Amazon is well established as the no. 1 player in online shopping, there are tens of thousands of entrepreneurs fighting for the favor of local shoppers with no clear leader in sight. Moreover, customers in our seven countries that constitute SEA, namely , , , , , and , have their individual preferences and unique tastes in accordance to their local culture. Here at iPrice, we set out with the mission to build SEA’s one-stop-shopping destination, aggregating the product catalogs of all these merchants into a single shopping experience for the end user. They wouldn’t have to visit each merchant one-by-one to find the products they are searching for: instead, iPrice categorizes the products and presents them to shoppers in a well-organized and visually-appealing fashion. The idea of product discovery was born, with the goal to make e-commerce more accessible and credible to the . Targeting 250 million SKUs to a population of almost 600 million peopleAt the beginning we had to ask ourselves the all too well-known question, “What technology platform to base iPrice on?” While a traditional SQL approach would have secured us easy access to developer talent, we were concerned about scalability. The two most popular e-commerce stores in SEA were launched just less than five years ago, but we knew that the region was about to experience its internet moment in a similar way to how China did a few years earlier. We were looking for a solution that was simple to set up with a small start-up team while we had little traffic and only a couple of million products, but scaled easily whenever the internet-burst would happen. From a functional perspective, e-commerce is all about search. Shoppers are trying to find one or two items they want to buy out of a catalog of hundreds of thousands of items. We are not a simple store but an aggregator of all stores—meaning we would have to deal with a scale that is one order of magnitude higher, carrying hundreds of millions of items. Our eyes naturally fell on Elasticsearch, a Lucene-based solution which was already renowned for its full-text search capabilities and had already gathered a decent reputation. Still undecided if we should stage Elasticsearch with a SQL-based primary data store, we thought through the customer purchasing journey through online portals. We found that customers only want to see the most recent information, meaning if they click on a product and it turns out to be out-of-stock on the merchant’s site, they can’t buy it and we lose a potential lead. As such, we had to make sure that our product catalog is always up-to-date, while avoiding any additional replication or synchronization of the data which would potentially take a couple of hours. While product data is the core of what we deal with, we also have to store the navigation structure of our portals, supplemental content that provides shoppers with contextual information about products they are interested in, and last but not least, data about the shoppers’ behavior on the site. Again, the question was whether to add a secondary SQL database or not, but the nature of this kind of data is also not very relational and Elasticsearch was already renowned for holding large amounts of log information. We settled on implementing our own CMS on top of Elasticsearch as our primary data store, going completely NoSQL in our approach, and this has benefited us in the long run. Importing >630 GB every 24 hours into a cluster with 320 GB of memoryThe following diagram illustrates our architecture in a nutshell. At the end of every day, our partners provide us with their latest product catalog in the form of CSV or XML files. After midnight, when it is unlikely that there are any updates to their inventory and the load on Elasticsearch is minimal, we start the import process. It is timed very accurately, to ensure most late night shoppers in SEA have gone to bed and the countries with different time-zones, like Indonesia, have also crossed the ending of the day. Within two and a half years, our product catalog has grown to 250 million products, and with that, it is quite natural that every day a couple of the feeds error out or provide invalid data. To catch these, we use the powerful aggregation capabilities of Elasticsearch to create reports showing how the new catalog differs from the old one. First thing in the morning, our Ops team looks at the report, and decides which parts of the new catalog are good to go out to our site.During this phase, our navigation structure is dynamically updated based on the new product catalog and written into another index—the content index. This pre-calculation makes sure that heavy aggregations won’t slow down the site during customer visits, an important learning that we will expound in detail later. At the same time the log data is analyzed and each product gets a new popularity value assigned. This value is used in the frontend for sorting the items, making sure that we show the most relevant ones to our users. Singapore and Hong Kong are relatively small in terms of their population with a couple of millions, whereas Indonesia’s hundreds of islands account more than 260 million people. Also, their status of development couldn’t be more different, while Singapore is amongst the world’s most modern cities, Indonesia’s capital is known for the world’s worst traffic. We serve a totally different product catalog in each country and had to specifically cater to SEA’s various demographic of online shoppers as well as their different consumer behavior, which affects our frontend performance. We decided to duplicate our index structure for each country, keep light content indices in dedicated cluster, and run each of heavy indices with products on its own cluster. This guarantees that only the required dataset for the individual country has to be loaded into memory for queries, but balances out the load across all nodes in the most efficient way. What we have learned while building this architectureImplementation of the first couple versions of our application have been quite minimal. At the same time, the amount of uncompressed data didn't exceed the limit of a few gigabytes. At that time, imports only took a few hours each night and were finished at the beginning of the day. Over time, the amount of data being imported increased more than 10-fold, which influenced the speed of the imports. The time it took was up to 10 hours, and we had to begin exploring possible ways to further optimize the process. First implementation of our infrastructure was quite simple. We used PAAS such as Qbox and Amazon’s Elasticsearch. This was justified as long as the dataset was within tens of gigabytes. It served us well in quickly setting up an Elasticsearch cluster and scaling it with our growing traffic. It has its limitations though, for example we were not able to tweak cluster settings like queue sizes, thread pools limits, or shard allocation. Migrating to EC2 self-hosted nodes allowed us to optimize our database to mostly query heavy operations during the day, while running a quick nightly import of our product catalog. At the same time, Elastic was developing new versions at a rapid speed, introducing a lot of performance optimizations, and such setup allows us upgrade as soon as a stable version is released. Moving further, while performing a set of benchmark tests, we decided to go with multi-cluster setup: running each node as a separate cluster. Scaling up the number of nodes in the clusters did not lead to linear performance increase. Import speed with the one node cluster is ~14,000 documents per second, adding second node to the cluster gives ~20,000 documents per second, a 50% improve approximately. At the same time, such setup allowed us to separate heavy indices with product catalog on a country level. Furthermore, we clearly understood that Elastic Block Storage volumes are not inferior in performance to directly attached instance storage, with its relatively limited size. Provision data volumes with size ~3.3 TB allowed us to get maximum IOPS performance that AWS provides for one general purpose SSD volume. Average disk utilization, during ingesting and heavy aggregations, was below 70%. It was worth mentioning that our major goal was ingest performance and search. In order to keep our infrastructure fault tolerant, we have developed certain backup and restore policies as well as introduced a caching layer on the frontend. Next, we started using bulk requests instead of singular index query and set the refresh interval to -1 during import. It reduced overhead of HTTP packages and improved indexing significantly. The exact bulk sizing depends on the average document size and we ran a few tests with different configurations to find the best performing configuration in our case. The basic idea was to measure and optimize the number of documents that are inserted in one bulk and the . We made measurements with bulk sizes in the range of 100–25,000 documents and number of threads ranging from 10-1,100. In our particular case, with average document size between ~3-5 KB and one m4.4xlarge node per cluster each having 16 cores, the optimal configuration is 70 simultaneously running threads with bulk size equal to 7,000 documents. This might sound very big—it’s about 75 MB per request—but since both our clusters and backend are located in the same AWS datacenter, the network bandwidth was not an issue. If you set the number of simultaneously running threads and bulk size too high though, the cluster nodes are unable to process data in its queue and requests will be rejected. Import time could be decreased by a factor of five this way. Import application runs on c4.8 large sized instance with 36 CPU cores and 60 GB of memory. Our use case involves post processing of data after it has been imported. Roughly speaking, we have to update, remove or insert documents, applying a set of rules and index it with supplemental content. Here we benefited from that can be used together with bulk API. Since documents in Elasticsearch are immutable, the update API follows a retrieve-change-reindex process. With partial update, we can specify only the fields of the document that should be updated. Merging process happens within each shard, avoiding unnecessary data transfer. The equivalent of bulk for getting data is the , which makes sure that data is held in memory and doesn’t have to be requeried while retrieving it in multiple chunks. Again, only the required fields should be retrieved using . All this helped us to solve issues with slow post-processing of the imported data. Since we are serving seven different countries, with an import running on a nightly basis, we create one new index per country in every cluster. Using one index per day makes sure that only the current version has to be held in memory by Elasticsearch and does not affect query performance. In order to release the freshest data set in each country, every day we use aliases which give us the flexibility to allow us to: We don't have to update anything on our frontend to use the new index name, take care of all of this. Over time, as our website offers more functionality, our queries and aggregations became heavier and the response time of some queries has increased dramatically as well as the node’s CPU usage. On one hand, deep investigation showed that we did not use filter cache efficiently, on the other hand some queries were just not well optimized. In order to improve them, we split queries that retrieved actual hits from aggregations. when a dynamic query that retrieves actual documents (hits) does not get cached. We still issue both together using the to avoid additional trips between our frontend and the database. You can quickly overlook that you have to manually enable the for each of your indexes. Keep in mind that the cache key is the complete json query, so if you change a small part, let’s say only the indenting, it will have to re-run the complete query again. We then got rid of exclude or include statements in aggregations. Filtering data in query helps to aggregate less data, which has a severe impact on performance. Next, we noticed that running aggregation on analyzed data field slows down response time significantly due to an explosion of possible terms. As an example, we used the path hierarchy analyzer for some document's fields. With more and more documents, the time aggregation queries on analyzed data field took increased up to 800 ms. Obviously, performance of certain pages went down. In order to address this, we defined raw fields in addition to the analyzed fields and run aggregation on them if possible. When timed, the duration required for aggregation was around 30-40 ms. Future improvements and optimizations on the way to 1 billion productsSo far, we have only scratched the surface of Elasticsearch’s capabilities and we have plenty of ideas to further exploit its full potential. Here is short excerpt of the projects we have on our backlog to give you an idea of the possibilities with Elasticsearch: Based on our experience, we are confident to say that Elasticsearch is the technology of choice to implement in these scenarios. There are always stumbling blocks along the road and we are planning to share our experience about implementing these scenarios in subsequent blog posts, so that you don’t have to stumble over them yourself. Stay tuned. About the Authors is the co-founder & CTO of iPrice Group Sdn Bhd. After working for Microsoft for four years, Heinrich left his position as a Product Manager for Visual Studio in Seattle and moved to Malaysia in 2014 to in initiate iPrice. With an affinity to bridge user experience and technology, he aims to make iPrice a leader in Southeast Asia's young e-commerce ecosystem. Combining NLT and a data-driven approach with visual discovery, iPrice strives to provide customers the most relevant products and coupons amongst the plethora of products on the internet is the lead DevOps Engineer at iPrice Group Sdn Bhd. He has been a part of iPrice since its inception in 2014. Anton is passionate about system architecture, overcoming challenges, and has an extremely strong affinity for automation and system software development iPrice Group is a meta-search website where consumers can easily compare prices, specs and discover products with hundreds of local and regional merchants. iPrice’s meta-search platform is available in six other countries across Southeast Asia namely in: Singapore, Malaysia, Indonesia, Philippines, Thailand, Vietnam and Hong Kong. Currently, iPrice compares and catalogues more than 200 million products and receives more than five million monthly visits across the region. iPrice currently operates three business lines: price comparison for electronics and health & beauty: product discovery for fashion and home & living: and coupons across all verticals.
Elastic Advent Calendar 2017, Week 3;;/blog/elastic-advent-calendar-2017-week-three;Mark Walkom;December 22, 2017;Engineering;; Week three of the leads us towards the last of the series. If you’re just joining us, this calendar is how we’re celebrating the end of the year — by sharing a daily Elastic Stack tip in our community . You can catch up on the first two weeks by heading . French, Korean and English feature this week, and we have topics ranging from Elastic Cloud Enterprise, through to building your own crawler that indexes pages to Elasticsearch and a fantastic post on monitoring Kubernetes using the Elastic Stack. As usual you can follow along live by checking out or subscribing to the , or watch for the daily tweeting on . Here’s a sample of what we’ve posted from our third week: Dec 15: by Abdon Pijpelink Dec 16: [[FR][Elasticsearch] Tests de performance pour votre plugin Elasticsearch]]() by David Pilato Dec 17: by Mat Schaffer Dec 18: by Jongmin Kim Dec 19: by Tyler Langlois Dec 20: by Medcl Zeng Dec 21: by Sherry Ger ! If you’re looking for other great pieces of reading, we recommend checking out our inspiration calendars — the (fully in Japanese) and (in English). And don’t be afraid to leave some feedback on the posts — we’d love to hear your thoughts!
Kibana's Road to 6.0 and the Removal of Mapping Types;;/blog/kibana-6-removal-of-mapping-types;Tyler Smalley;December 20, 2017;Engineering;; When Elasticsearch in 6.0.0, it was a major breaking change that affected any application that uses indices with multiple types. Kibana is one such application, relying on multiple types to store objects like visualizations, dashboards, index patterns, and more. How did we successfully migrate Kibana to be compatible with Elasticsearch 6? We share this strategy with you below, and hope that it gives you ideas for how to convert your multi-type indices to single-type. First, deciding what the new mapping was going to look like. There are a few common alternatives to mapping types: We chose adding a custom type field and nesting the data under the name of the type. This allowed us to have fields with the same name under different types, which was previously a limitation. Here is a visual of what this Elasticsearch document transformation looks like: Once we had the mapping, the next step was migrating the data. When creating the new index, we need to ensure is , otherwise we will not be able to fall-back to the single type. Additionally, 5.6 has an option which mimics the behavior in 6.0. After creating the new index with the desired mapping, we reindexed the data into the new format. To do this, we leverage the reindex API to transform the documents, setting the type field and nesting the previous data under the name of the type. Since IDs are only unique within a type, we also prefix the ID with the type. Using an alias, we can swap out the index in a single atomic action without downtime. Details of this full migration process can be found in our . Kibana 5.6 is considered a compatibility release, allowing users to perform a rolling upgrade to 6.0 without downtime. This means Kibana 5.6 needs to seamlessly handle both single and multiple mapping types. To allow for this, we introduced a which has become the preferred way of programmatically interacting with Kibana data. This allows for a consistent interface, regardless of the underlying Elasticsearch document structure. In 5.6, we still default to using multiple types, but fall back to a single type when we identify the data has been migrated. Here are a few examples of how we built a fallback system. Create When creating a document we will receive a if the type does not exist. We receive this error after the data has been migrated, allowing us to re-try with the single-type format. POST /.kibana/doc/index-pattern:abc123/_create { type:index-pattern, index-pattern:{ title:Test pattern } } GetInstead of making two requests, we can perform a single search to capture the document. Here we use a boolean query containing both of the formats. POST /.kibana/_search { query:{ bool:{ should:[{ bool:{ must:[{ term:{ _id:abc123 } }, { term:{ _type:index-pattern } }] } }, { bool:{ must:[{ term:{ _id:index-pattern:abc123 } }, { term:{ type:index-pattern } }] } }] } } } DeleteWhen deleting a document, we use , providing the same query used for fetching a document. UpdateWhen using the _update API, we receive a if the document is missing. We receive this after the data has been migrated, allowing us to re-try with the single-type format. POST /.kibana/doc/index-pattern:abc123/_update { index-pattern:{ title:My new title } } These changes allowed us to migrate the Kibana index at any time during a 5.6 installation. In X-Pack, we have made this process even easier for users through our . If you have indices created in 2.x, you must reindex them before upgrading to Elasticsearch 6.0. In addition, the internal Kibana and X-Pack indices must be reindexed to upgrade them to the format required in 6.0. The Reindex Helper identifies these indices and provides a button to perform the reindex. Have a special use case or strategy for how to migrate your multiple type indices to single type? Share it with us! You can also ask us for advice on our .
Applications for Django Girls San Francisco Workshop Now Open;;/blog/applications-for-django-girls-san-francisco-workshop-now-open;Michelle Carroll;December 20, 2017;Culture;; We’re excited to be hosting a Django Girls workshop in San Francisco on Sunday, February 25! . If you’re not familiar with the organization, Django Girls is on a mission to inspire a love of programming for newcomers, and especially for underrepresented folks in tech (like women). The organization enables local teams of volunteers to set up one-day workshops, with an emphasis on achieving small successes in a supportive environment. It’s an organization and a mission we’re passionate about — democratizing technology, keeping things fun, and having fantastic documentation to boot. We strongly believe in the value of Django Girls. We are not only hosting this event, but we have sponsored 18 other events all over the world. This is our second year hosting a workshop around Elastic{ON}, and it’s been an amazing experience for everyone involved. (Want an additional perspective? .) As part of a distributed company, is a rare opportunity to bring many employees together in one place and work together on passion projects — like being coaches for the workshop. We’re also super-jazzed to be hosted by in their awesome space. While most of the folks who are using the Elastic Stack have some experience programming, we know that’s not universal — and folks who are in dev, ops, QA, analyst, or other programming-heavy roles often get asked for recommendations on learning how to code. Help us spread this education by getting the word out on this free workshop. Share this blog post or the , and don’t forget that applications close January 30! And finally, we recommend checking out the to see what workshops are coming up (and when they’re looking for coaches or applicants) and to get a sense of what it takes to organize one in your local region. Django Girls. Dream. Create. Code. We hope to see you there!
Smarter Machine Learning Job Placement in Elasticsearch;Smarter Machine Learning Job Placement in Elasticsearch;/blog/smarter-machine-learning-job-placement-in-elasticsearch;David Roberts;December 19, 2017;Engineering;; Ever since we , ML jobs have been automatically distributed and managed across the Elasticsearch cluster. To recap, you specify which nodes you’re happy for ML jobs to run on by configuring the and settings to on these nodes. Then, when you the cluster runs the job’s associated analytics process on one of those nodes, and the job will continue to run there until it’s closed or the node is stopped. Prior to version 6.1 this node allocation was done in a very simple way: a newly opened job was always allocated to the node running the fewest jobs subject to a couple of : Different ML job configurations and data characteristics can require different resources. For example, a single metric job generally uses very little resource, whilst a multi-metric job analysing 10,000 metrics will require more memory and CPU. But no account was taken of the expected or actual resource usage of each job when allocating jobs to nodes, which could lead to: To mitigate these problems, starting in version 6.1 ML will allocate jobs based on estimated resource usage. Each ML job runs in a separate process, outside of the Elasticsearch JVM. In 6.1 we have added a new , , to control the percentage of memory on a machine running Elasticsearch that may be used by these processes associated with ML jobs. This is a dynamic cluster setting, so the same number will apply to all nodes in the cluster, and it can be changed without restarting nodes. By default the native processes associated with ML jobs are allowed to use 30% of memory on the machine. ML will never allocate a job to a node where it would cause any of these constraints to be violated: For nodes where none of the hard constraints would be violated, we will continue to allocate jobs to the least loaded ML nodes. However, instead of number of jobs being the definition of load it is now estimated job memory. Job memory is estimated in one of two ways: The first method is preferred, but cannot be used very early in the lifecycle of a job. The estimate of job memory use is based on actual model size when the following conditions are met: Once jobs are “established” according to these criteria, we should be able to make pretty good decisions about whether opening a new job is viable. However, if many jobs are created and opened around the same time then we will tend to be restricted by the configured in the . Before version 6.1 our default was 4GB, and this was excessive in many cases. Therefore, in version 6.1 we have cut the default to 1GB. If you are creating advanced jobs that you expect to have high memory requirements we’d encourage you to explicitly set this limit to a higher value when creating the job. And there should be less scope to hog resources if you accidentally create a job that would use a lot of memory if it were allowed to run unconstrained. Similarly, if you’re creating jobs that you expect to have very low memory requirements, we’d encourage you to set in the to a much lower value than the default. We’ve done this in the job creation wizards in Kibana: single metric jobs created using the wizard now have their set to 10MB, and multi-metric job created by the UI wizard have it set to 10MB plus 32KB per distinct value of the split field. Because of the smarter limiting of ML jobs based on memory requirements, in version 6.1 we’ve increased the default value for from 10 to 20. Of course, if you have large jobs or little RAM then the memory limit will kick in and you won’t be able to open 20 jobs per node. But if you have many small jobs then you’ll no longer be artificially restricted. If you’ve previously customized the value of then you may wish to revisit your setting taking account of the new functionality. During rolling upgrades to version 6.1 ML jobs might be running in mixed version clusters where some nodes are running version 6.1 and others pre-6.1. In such clusters some nodes will know enough to apply the new allocation logic and some won’t. Additionally, it is theoretically possible for the node stats API to fail to determine the amount of RAM a machine has. Rather than try to do anything clever in these scenarios, the following simple rule will apply: if any ML node in the cluster is unable to participate properly in the memory-based node allocation process then the pre-6.1 count based allocation will be applied to all ML node allocation decisions. For people running clusters on supported operating systems where all nodes have been upgraded to version 6.1 this should not be a problem. .
Kibana 6.1.1 released;;/blog/kibana-6-1-1-released;Court Ewing;December 19, 2017;Releases;; Today, we released Kibana 6.1.1 with a fix for a high severity security vulnerability in the Time Series Visual Builder. All administrators of Kibana 6.1.0 are urged to upgrade Kibana immediately. Versions prior to 6.1.0 are not affected. If you had any Kibana 6.1.0 instances on , we’ve automatically upgraded them, so no further action is required. For folks that cannot upgrade from 6.1.0 at this time, you can disable time series visual builder entirely by specifying in kibana.yml and restarting Kibana. Note, this will require a full “optimize” run, which can take a few minutes. Math aggregations and remote code execution In Kibana 6.1.0, we released a new feature for “math aggregations” in the Time Series Visual Builder which allowed users to apply mathematical operations to their TSVB results. Unfortunately, this new feature has a vulnerability that could allow an attacker to execute arbitrary code on the Kibana server. We’ve in 6.1.1. Removing a feature is never something we take lightly, especially in a patch release, but the issue is severe and there isn’t a reliable way to permanently fix it. We do want to have this sort of math capability in Kibana at some point, but we need to take a more holistic view on its security before releasing it again. There are a couple of other bug fixes in this release as well, so check out the for all the details. If you’re not using , head on over to our page to get the release now.
Logstash Lines: Update for December 19, 2017;;/blog/logstash-lines-2017-12-19;Andrew Cholakian;December 19, 2017;The Logstash Lines;; Hello Logstashers! We're glad to present you all the past couple week's news in convenient digest form! This past week has been mostly about fixes. We have some big picture things we're working on, but nothing in enough shape to share yet.
Default Password Removal in Elasticsearch and X-Pack 6.0;Default Password Removal in X-Pack 6.0;/blog/default-password-removal-elasticsearch-and-x-pack-6-0;Jay Modi;December 18, 2017;Engineering;; The Elasticsearch team takes pride in making software that is easy to get started with, which allows developers to make progress on their projects at a faster pace. The team wanted the same experience for X-Pack security features and out of this desire, the addition of built-in user accounts was born. X-Pack ships with a built in administrator account and accounts for Kibana and Logstash system users. In 5.x, these accounts have a default password of ‘changeme', which was chosen with the hopes that users would heed the advice embedded in the password and, well, change the password. Hope is not good enough when it comes to securing applications: relying on hope means we assume our users know about these accounts and the default password and also know why these need to be changed. As a company, relying on hope is like rolling the dice for becoming the next major piece of software featured in the news cycle as the culprit responsible for a bad data leak. In order to provide better security, we made for 6.0 that removed the default password altogether. The removal of the default password has the effect of adding a single step to the getting started process for Elasticsearch and we felt that this tradeoff was the right one to make when it came to shipping software that is secure. Getting this process down to a single step was not easy: there were a lot of ideas and a lot of back-and-forth discussions on how we accomplish this. The solution makes use of an auto-generated seed value on each node. This seed value serves as the initial password for the elastic user. The seed value alone could have been a fine solution but it has its own issues: the most important being that we have a different password for the elastic user on each node. In terms of usability, the seed value as the elastic password would complicate the getting started experience as it would require additional manual steps to configure passwords for other users such as the `kibana` user. More work was needed to make getting started a nice experience. Moving beyond the seed value, a new tool, ‘’, has been added to make the initial password setting as easy as possible. The tool has both an interactive mode where the user can provide their own passwords and an automated mode that sets the passwords to a random value, which is then sent to standard out. Let’s take a look at how easy it is to get started with X-Pack:
This Week in Elasticsearch and Apache Lucene - 2017-12-18;;/blog/this-week-in-elasticsearch-and-apache-lucene-2017-12-18;Clinton Gormley;December 18, 2017;This week in Elasticsearch and Apache Lucene;; Welcome to ! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources. automatically optimises away the need to track versions of in-memory buffered documents while indexing if all documents in the ram buffer are guaranteed to have no duplicates and are documents using auto-generated IDs. This reduces the GC overhead drastically in high-throughput scenarios (up to 50%) and offers a depending on the workload. This change will come in 6.2. Elasticsearch 6.2.0 will be the first release of Elasticsearch to officially support JDK 9. Elasticsearch 6.2.0 will run out-of-the-box on both JDK 8 and JDK 9. We as of the JDK, but Elasticsearch will move forward with the JDK ecosystem. When , releases of Elasticsearch will stop supporting JDK 9: we intend to support JDK 10 but there is no guarantee of that at this time. Support for JDK 8 will continue until end-of-life in of the JDK. A new () has been added to master and is planned to be backported to 6.2. The ranking evaluation API can be used to evaluate the quality of ranked search results over a set of typical search queries. Users can supply a set of typical queries together with a list or manually rated documents, and the API will perform the queries and calculate common information retrieval metrics like mean reciprocal rank, precision or discounted cumulative gain on it. The API is currently marked as experimental and will probably change a bit in the foreseeable future. More details about the current state can be found in the . Ranking via the API is a very manual process at the moment, so we only expect to see traction around this feature once we have a UI to make interaction much more point-and-click. Brainstorming in progress with the Kibana team. Changes in 5.6: Changes in 6.0: Changes in 6.1: Changes in 6.2: Changes in 7.0: Apache Lucene There is an ongoing to release Lucene 7.2.0, which is going well so far. and .
Keeping up with Kibana: This week in Kibana for December 18, 2017;;/blog/keeping-up-with-kibana-2017-12-18;Jim Goodwin;December 18, 2017;Kurrently in Kibana;; Welcome to This is a weekly series of posts on new developments in the Kibana project and any related learning resources and events. 6.1 has a new homepage with even more dashboard customization options and input controls. — elastic (@elastic) Hi all, The big event last week was shipping the 6.1 release of Kibana, you can see a link to the blog post in the tweet above.Work on integrating into Kibana continues and it hopefully will be available for use in the 6.2 release. One of the important parts of EUI is its documentation, including a new set of guidelines for writing content for Kibana. You can see some of the work in progress here: I'd also like to point out some recent Webinars featuring members of the Kibana team:With some significant changes to the visualize API, using existing Kibana visualizations in your own plugins has become much easier. Join Alex Francoeur, Thomas Neirynck, and Peter Pisljar for a live demonstration to learn how to develop visualizations in Kibana. Version 6.0 introduces many new features and improvements across all components of the Elastic Stack. In this webinar, Archana Sriram, Mike Baamonde, and George Kobar will walk you through the considerations, best practices, and caveats for upgrading your Elastic Stack to 6.0. That's all for this week. Cheers, Jim
Brewing in Beats: Monitor RAID status with Metricbeat;;/blog/brewing-in-beats-monitor-raid-status-with-metricbeat;Monica Sarbu;December 14, 2017;Brewing in Beats;; Did you know that is already available? Try it and let us know what you think. If you are curious to see the Beats in action, we just published the . Metricbeat: RAID metricsetAdded upon request from our own Cloud team, this polls and records software RAID specific metrics. The metricset is part of the system module and is planned to be released in 6.2. New community Beat: TracebeatCreated by , this sends traceroute pings and indexes the results into Elasticsearch. Other changesRepository: elastic/beatsAffecting all BeatsChanges in master: Changes in 6.1: Changes in 6.0: MetricbeatChanges in master: Changes in 6.1: Changes in 6.0: PacketbeatChanges in master: FilebeatChanges in master: WinlogbeatChanges in master: Changes in 6.1: Changes in 6.0: ProcessorsChanges in master: Machine learning jobsChanges in master: TestingChanges in master: Changes in 6.1: Changes in 6.0: InfrastructureChanges in master: Changes in 6.1: Changes in 6.0: PackagingChanges in master: DocumentationChanges in 5.3: Changes in master: Changes in 5.6: Changes in 6.1: Changes in 6.0: Repository: elastic/gosigarChanges in master:
Elastic Advent Calendar 2017, Week 2;;/blog/elastic-advent-calendar-2017-week-two;Michelle Carroll;December 14, 2017;;; Week two of the has been a busy one! If you’re just joining us, this calendar is how we’re celebrating the end of the year — by sharing a daily Elastic Stack tip in our community . You can catch up on the first week by heading . In this batch, we’ve got posts in Chinese, English, German, Portuguese, Italian, and Japanese, and covered everything from text analysis to migrating data with the reindex API to analyzing your hockey game and cryptocurrencies with the Elastic Stack. You can follow along live by checking out or subscribing to the , or watch for the daily tweeting on . Here’s a sample of what we’ve posted from our second week Dec 8: by Medcl Zeng Dec 9: by Jordan Sissel Dec 10 by Philipp Krenn Dec 11: by Thiago Souza Dec 12: by Atonio Bonuccelli Dec 13: by Aaron Aldrich Dec 14: by Jun Ohtani ! If you’re looking for other great pieces of reading, we recommend checking out our inspiration calendars — the Elastic Stack calendar on Qiita (fully in Japanese) and SysAdvent (in English). And don’t be afraid to leave some feedback on the posts — we’d love to hear your thoughts!
How to Build a Site Search UI;;/blog/how-to-build-a-site-search-ui;Sam Reid;December 14, 2017;Engineering;; .w3-btn,.w3-button{border:none: display:inline-block: outline:0: padding:12px 20px: vertical-align:middle: overflow:hidden: text-decoration:none: color:inherit: background-color:inherit: text-align:center: cursor:pointer: white-space:nowrap} .w3-btn:hover{box-shadow:0 8px 16px 0 rgba(0,0,0,0.2),0 6px 20px 0 rgba(0,0,0,0.19)} .w3-btn,.w3-button{-webkit-touch-callout:none: -webkit-user-select:none: -khtml-user-select:none: -moz-user-select:none: -ms-user-select:none: user-select:none} .w3-disabled,.w3-btn:disabled,.w3-button:disabled{cursor:not-allowed: opacity:0.3}.w3-disabled *,:disabled *{pointer-events:none} .w3-btn.w3-disabled:hover,.w3-btn:disabled:hover{box-shadow:none} .w3-black,.w3-hover-black:hover{color:#fff!important: background-color:#00BFB3!important} Note: The original extended post is available on the . Search nirvana: A powerful backend + well-designed UIDepending on the purpose and scale of your website, search can be a critical feature that enables your users to quickly find the information they need. Elasticsearch makes it significantly easier to architect a search engine that delivers relevant results, but building your search backend is only part of the work of implementing a search experience. Without an intuitive search interface, your users may not get the full value of your search engine. What we’ve learned as search providersAt (an Elastic Company), we provide search as a service to completely handle the backend of your search engine, and we also help you to build well-designed search UIs. Swiftype is built on the Elastic Stack which has enabled us to support over 10,000 production search engines and serve over 5 billion queries a month. It’s safe to say that we’ve learned a thing or two about search over the years as we’ve helped small and large companies like Lyft, AT&T, Twilio, Asana, and Samsung provide top-notch search experiences. If you’d like to learn more about Swiftype’s architecture and use of the Elastic Stack, . In our experience helping these organizations and many others with search, we’ve been able to see what works well when it comes to search UIs and apply those learnings to both our out-of-the-box search interface as well as our jQuery libraries that we support for creating fully custom UIs. While building a great search interface can take some effort, we’ve consistently seen companies reap the rewards of implementing well-designed search UIs which include revenue growth and better user engagement. Implementing your search UISwiftype can help you to get a jump start on building your next search experience, including the UI (or ). In 3 steps, you can have a functioning search UI implemented. — Index your data into Swiftype — Tune your search results — Implement your search bar
Machine Learning 6.1.0 Released;;/blog/machine-learning-6-1-0-released;Steve Dodson;December 14, 2017;Releases;; On Demand Forecasting Smarter Node Allocation for ML Jobs Automatic Job Creation for Known Data Types Data Visualizer This tool summarizes the key features in the data, such as cardinality of fields, sparsity and counts of key values. Moving forward we will extend this view to help you create more effective analysis configurations for time series ML jobs: Population Analysis Job Wizard Job GroupsOverall Buckets GET _xpack/ml/anomaly_detectors/job-1,job-2,job-3/results/overall_buckets { overall_score: 75, start: 1403532000000, top_n: 2 } { count: 1, overall_buckets: [ { timestamp : 1403532000000, bucket_span : 3600, overall_score : 75.0, jobs : [ { job_id : job-1, max_anomaly_score : 80.0 }, { job_id : job-2, max_anomaly_score : 70.0 }, { job_id : job-3, max_anomaly_score : 14.0 } ], is_interim : false, result_type : overall_bucket } ] }
You know, for visualizing your Logstash pipelines;;/blog/logstash-pipeline-viewer-6-0;Andrew Cholakian;December 13, 2017;Engineering;; Logstash’s strength is its flexibility. With its minimalist syntax and rich set of plugins, users have been able to conjure up all kinds of Logstash pipelines — from to . As you create increasingly sophisticated pipelines, you may have discovered that understanding these pipelines becomes harder. You might want to see the overall shape of your pipeline, understand where branches in data flow might be, and how the various parts of your pipeline perform under actual production conditions. To help you answer some of these questions, we have introduced the Logstash Pipeline Viewer UI. Concepts and Terminology Before we talk some more about the Pipeline Viewer, it’s necessary to understand some associated concepts and terminology first. Every Logstash pipeline has an . If you define your pipelines using the or command-line options, your pipelines will have their ID set to “main”. If you define your pipelines or , you will be able to set your own pipeline IDs to something more descriptive like, say, “Apache Access Logs Processing”. For every (uniquely ID’d) pipeline, there can be multiple of that pipeline. What differentiates one version of a pipeline from another version of that pipeline are its contents. For instance, say you created a pipeline with ID = “p1” containing one input, one filter, and one output plugin. At this point, only one version of “p1” exists. At some point in the future, say you modified “p1” so it contained the same plugins as before but also included an additional filter plugin. Now “p1” would have two versions in existence - the original one and the changed one. Since the version of a pipeline is based on its contents, Logstash will automatically calculate a version for you. It will look something like this: . This version currently only shows up in the Monitoring UI in Kibana (where the Pipeline Viewer feature lives), and even there it often shows up in a shortened form for ease of readability. The Tour Great! Now that you understand pipeline IDs and versions, let’s dive into the Pipeline Viewer UI itself. We assume that you have one or more Logstash pipelines currently running and to Elasticsearch. Navigate to the Monitoring tab in Kibana and scroll down to the Logstash section. You might notice that there is a new item in this section called “Pipelines”. Clicking here shows you all pipelines across your Logstash deployment that have been running during the time window set by the Kibana Time Picker (top right of screen). For each pipeline the list of its versions that existed during the Kibana Time Picker time window are shown. Clicking on a pipeline’s version takes you to the Pipeline Viewer, the centerpiece of this blog post. It shows a rendering of that specific version of the pipeline. The pipeline is visualized as a directed acyclic graph, with vertices and edges. Vertices in the graph can represent one of three elements in a Logstash pipeline: plugins, if conditions, or the queue in Logstash that exists between the input and filter stages of a pipeline. Edges represent paths that events can take as they travel through the pipeline. Plugin vertices show the ID of the plugin (if one is specified by the pipeline creator via the property), the plugin’s throughput (in events per second), and the latency (in milliseconds) introduced by the plugin as it processes events passing through it. If condition vertices show the condition being tested. They have two types of edges emanating from them: “true” or “T” edges, representing the path an event would take if the condition is met, and “false” or “F” edges, representing the path an event would take if the condition is not met. Today we do not show metrics for if condition vertices or the queue vertex but this is something we plan to add in the future. Applications Now that you know your way around the Pipeline Viewer, here are some of its practical applications. As mentioned at the start of this blog post, you can use the Pipeline Viewer to get a bird’s eye view of your pipelines. This is especially helpful for more sophisticated pipelines. From this view you can also zoom in closer and inspect specific plugins. This will help you identify performance bottlenecks so you can work towards alleviating them. Finally, as the Pipeline Viewer visualizes a specific version of a pipeline at a given time, you can make changes to the pipeline and compare the effects of these changes with the version prior. Limitations and Future Plans The Pipeline Viewer uses a force-directed graph layout algorithm to optimally lay out the vertices and edges representing the pipeline. For clarity, the algorithm tries to prevent overlap of vertices and minimize overlap of edges. It tries to solve for these constraints in a reasonable amount of time, a few seconds. Turns out that such an algorithmic layout problem is a tough one to solve! So the algorithm comes up with a “best effort” solution. What this means for you, the user, is that depending on the complexity of your pipeline, you may see layouts ranging from obvious to downright messy. We are continually tweaking this algorithm and even considering alternatives to make layouts more readable and therefore useful for users in the future. In addition to better layouts, you can also expect to see metrics for the queue and if conditions in the future. We also plan to add a detail panel for each vertex. This panel will let you see the vertex’s complete configuration as well as charts visualizing the vertex’s performance metrics over time. We hope you are as excited about the Pipeline Viewer as we are. Please play with it (requires an X-Pack Basic license, which is free) and so we can continue to make the right improvements to it. And watch this space for an “under the covers” look at the Pipeline Viewer UI, in which we’ll talk about the engineering aspects of how it’s built, the challenges we faced building it, and some of the options we’ve explored over the course of its evolution.
Elasticsearch 6.1.0 released;Elasticsearch 6.1.0 released;/blog/elasticsearch-6-1-0-released;Clinton Gormley;December 13, 2017;Releases;; Today we are pleased to announce the release of , based on . This is the latest stable release, and is already available for deployment on , our Elasticsearch-as-a-service platform. Latest stable release in 6.x: You can read about all the changes in the release notes linked above, but there are a few changes which are worth highlighting:
Logstash 6.1.0 Released;;/blog/logstash-6-1-0-released;Andrew Cholakian;December 13, 2017;Releases;; Logstash 6.1.0 has launched! We've got some great new features to talk about! Read about it here, or just head straight over to our and give it a shot! However, you may want to take a minute to read about and first. Read on for what's new in Logstash 6.1.0. We’re proud to announce a great new way to extend Logstash functionality in 6.1.0. Complex modification of events in Logstash is now much easier due to our new feature, file based Ruby scripting via the Logstash Ruby filter. While the Ruby filter already lets you use custom ruby code to modify events, that code must live inside the Logstash configuration file itself, which doesn’t work well for longer pieces of code, and is hard to debug. Additionally, there’s never been a good way to reuse or test that code. File based Ruby scripts can take arguments, letting you reuse code within your Logstash configs easily. This new feature lets you write Ruby code in an external file, with tests inline in that same file, and reuse that anywhere you’d like. Another nice feature here is that we can generate accurate line numbers in stack traces for exceptions in file based Ruby scripts, making them much easier to debug. The full details for configuring this are available in the . For a short example see below: To configure the ruby filter with a file use: filter { ruby { # Cancel 90% of events path => /etc/logstash/drop_percentage.rb script_params => { percentage => 0.9 } } } The file 'drop_percentage.rb' would look like: def register(params) @should_reject = params[reject] end def filter(event) return [] if event.get(message) == @should_reject event.set(field_test, rand(10)) extra_processing(event) [event] end def extra_processing(event) # .. end test non rejected events do parameters do { reject => hello } end in_event { { message => not hello } } expect(events to flow through) do |events| events.size == 1 end expect(events to mutate) do |events| events.first.get(field_test).kind_of?(Numeric) end endLogstash 6.1.0 brings some exciting new changes to the Logstash internals. We’ve been working on a full rewrite of the internal execution engine in Logstash. This rewrite moves the core execution logic from JRuby to Java/JVM Bytecode. With this approach we’ll be able to pave the way to more performance improvements in the future, as well as the ability to write optimized Logstash plugins in any JVM language.This feature is currently disabled by default, and users should note that it is experimental and not yet ready for production. To enable this feature, you’ll need to use the '--experimental-java-execution' flag. We encourage users to try this flag out in test and staging environments and report any bugs found. Our hope is to make this the default execution method sometime in the 6.x timeframe.
Beats 6.1.0 released;;/blog/beats-6-1-0-released;Monica Sarbu;December 13, 2017;Releases;; We’re pleased to announce the Beats 6.1.0 release. This is the latest stable version and it comes with lots of new modules and an exciting Autodiscovery feature. Docker AutodiscoveryBeats 6.1 brings in the first phase of autodiscovery support. Autodiscovery allows the user to configure providers, that watch for system changes and emit events to a common bus. Based on these events, the Autodiscovery system detects situations when there is something new that we can monitor and instantiates new Beats modules for it. In general, Autodiscovery allows the Beats to react and adapt to changes in the ever more dynamic infrastructures. The first provider watches for Docker events. It supports config mapping from container metadata to config templates, so new modules are created when a container starts. metricbeat.autodiscover: providers: - type: docker templates: - condition: equals.docker.container.image: redis config: - module: redis metricsets: [info, keyspace] hosts: ${data.host}:${data.port} The above is an example configuration that instantiates the Metricbeat Redis module every time a new redis container (defined by having the redis image) is started. Note that the connection information (host/ports) is filled in by the autodiscovery support via a template. Future releases will add more Autodiscovery providers, for example for Kubernetes events and package managers. New Metricbeat and Filebeat modulesEach Beats release adds a few new Metricbeat and Filebeat modules, but 6.1 really sets the bar higher. Many of these modules are contributed by our users (Thank You!). Let’s go through the list: TLS support in PacketbeatPacketbeat 6.1 adds , which is one of the most anticipated Packetbeat features. It doesn’t mean decrypting traffic, but it parses the initial handshake and extracts data like ciphers supported by the client and the server, the client and server certificate chains, the subject alternative name (SAN), validity dates, raw certificates, and so on. This data is super valuable for debugging TLS issues and also for intrusion detection and auditing. The implementation also comes with support for the extension to TLS, which allows Packetbeat to detect, for example, whether HTTP/2 or HTTP/1 are used as an application protocol on top of the TLS connection. Docker JSON-file prospector in FilebeatFilebeat 6.1 comes with an (experimental) Docker prospector that implements the default . Filebeat could already read Docker logs via the log prospector with JSON decoding enabled, but this new prospector makes things easier for the user. It abstracts the format, so there is no need to manually configure JSON decoding. Here is an example config, which captures the logs from a single container specified by its ID: prospectors: - type: docker containers.ids: - c3ec7a0bd9640151a768663b7e78c115d5b1a7f87fba572666bacd8065893d41 It also parses the timestamp from the JSON file, something that wasn’t previously possible with Filebeat alone (it required Logstash or Ingest Node). This new prospector works great with the Docker Autodiscovery provider. New Auditbeat dashboards Auditbeat 6.1 comes with several in the default configuration file, which makes it easier to get started with. To match the use cases, we also have three new dashboards: FeedbackIf you want to make use of the new features added in Beats 6.1.0, please , install it, and let us know what you think on Twitter () or in our .
Kibana 6.1.0 is released;;/blog/kibana-6-1-0-released;Jim Goodwin;December 13, 2017;Releases;; You may be thinking But wait, you just released 6.0?... I know, right! But 6.1.0 has been percolating for months and now it is here. New in this release we have: You can now create input control visualization components which when placed on a Dashboard allow users to select particular values from a terms aggregation from a multi-select drop down control or select a range from a min/max aggregation using a range slider control. This will make it easy to guide users to important filtering values for the dashboard and make it simple for them to apply filters and explore the information on the dashboard.Kibana has a homepage! Finally clicking on the Kibana logo in the upper left will do something useful! And finally new users will get some great pointers as to what to do with Kibana when they launch it for the first time! Please watch this space, there is a lot more coming...Beginning with X-Pack monitoring 6.1, the Monitoring UI will automatically use to load data from remote clusters defined using the Elasticsearch connection's remote cluster list . That means if you only define elasticsearch.url in your kibana.yml, and configure your dedicated monitoring clusters as remote clusters of it, then the Monitoring UI will detect and show all clusters! We also took the time to optimize the experience so that it routes requests to specific monitoring clusters whenever possible. This provides multiple advantages over the existing behavior:Using Cross Cluster Search is now the preferred way to talk to a dedicated monitoring cluster because of these benefits and simplifications. We hope that it helps you to both improve your Elastic Stack monitoring and simplify it at the same time.Newly introduced visualizations can now be part of labs-mode. Visualizations in labs-mode introduce new more cutting-edge functionality and can be subject to change across minor releases. Labs-mode can be turned off in the advanced settings. Labs-visualization will then no longer be available to the user. The Time Series Visual Builder is not part of labs-mode, it continues to be an experimental feature. The input controls are the first to be flagged as a lab visualization.Pie charts now support data labels making it easy to understand the values being presented without having to look back and forth to a legend.We have improved the use of Region Maps for deployment in environments without internet access. Similar to the Coordinate Map visualization, the Region map can now use a WMS-service as a base-layer. Admins can now also setup Kibana to opt-out of connection to the Elastic Maps Service. Users can now opt-out of having the visualization display warnings.Reporting now has the ability to render Dashboards in a WYSIWYG manner to PDF preserving the locations and sizes of panels on the dashboard.There are now additional options for customizing the content on Dashboards. We've added a new option to use margins to add separation between Dashboard panels. We've also added the capability to customize the panel titles or hide them altogether. The Management application now supports managing the license for your cluster including seeing your current license level and expiration information, links out to obtain a Basic or a paid license and support for uploading and installing a new license. We've also made sure that you'll be able to log into Kibana to use this tool even if your license has expired.Some settings are sensitive, and relying on filesystem permissions to protect their values is not sufficient. For this use case, Kibana provides a keystore, and the kibana-keystore tool to manage the settings in the keystore. [More information here: ] Please , try it out, and let us know what you think on Twitter () or in our . You can report any problems on the .
Elastic Stack 6.1.0 Released;;/blog/elastic-stack-6-1-0-released;Tyler Hannan;December 13, 2017;Releases;ja-jp; 6.1.0 is here. Fresh on the heels of the 6.0.0 GA release, we are pleased to introduce you to the capabilities of 6.1.0. You should download it now or use it on (your favourite hosted Elasticsearch and Kibana provider. The line between products and features is often a blurry one. And we feel that pain – or experience that delight – as keenly as many. When there are so many features to highlight in a release, where do you even start? Either you craft the next great novel or you choose to provide links to details. Happy reading and…more importantly…happy searching, analyzing, and visualizing. Let’s start with some features that are, without question, worth mentioning: APM After announcing that we joined forces with Opbeat, and an alpha (in the 6.0 timeframe), we are super pleased to share that Elastic APM is now in beta. This includes not only all of the goodness we have described in the past, but also features a brand new UI. It is available in X-Pack Basic (free!) and you can read more information in the . Machine Learning Unsupervised, Automated, Expedited…adjectives abound when describing machine learning solutions. But, as our team has said, Elastic Machine Learning ‘catches what you might miss, all by itself.’ In 6.1.0 this expanded to include a series of substantive new features including On Demand Forecasting (based on the past, what values would you expect in the future), smarter allocation for efficiently assigning jobs to ML nodes, and automatic job creation for known data types. The fun doesn’t stop there, and all the features are described in the post. Elasticsearch The summary, or aggregation, of features is in the . Kibana Visualize the future of interacting with your data in the because there is MUCH more than can be listed here. Logstash Grok the details in the . Beats If you want all the details, ‘Go’ read the . Get It Now!
Custom Region Maps in Kibana 6.0;;/blog/custom-region-maps-in-kibana-6-0;Mark Walkom;December 12, 2017;Engineering;; Grab Your Cartometers In Kibana 5.5 we added the ability to define your own , which are also also known as choropleth maps. This allows users to define custom geo-boundaries as , then overlay them on the Elastic Map Service to display aggregations for custom geographical areas natively in Kibana. This is a simple yet very powerful method for providing localised insights from your datasets. We've touched on this previously in a , but let's dig into this some more and run through a practical example of how to deploy it step by step in the Elastic Stack! If you'd like to replicate this blog post as you read it, the code for doing so is all . Ready, Set…. Before we start, let's define a few words that we will use throughout this post that you may be unfamiliar with, or have heard elsewhere in different contexts: Sourcing Your Map has a number of high quality maps and for this post we'll be using the Australian State file. If you download the file, extract it and take a look in it, you will find a repetitive structure, with the geoshapes we need and other fields that we will use later. There are a number of other sources out there that build and provide geojson files for you to download and use, your favourite search engine can help locate them for you. Ultimately, if you can't find what you want on the internet then you can build your own geojson files. That's outside the scope of this blog post, but luckily exploratory.io themselves have an excellent blog post on how to do that right ! Configuring Your Source Now that we have our geojson file, we need to let Kibana know how to read it and map the geoshapes it holds onto the tiles that the Elastic Map Service provides. kibana.yml Using the extracted aus_state.geojson from the Australia States archive, we need to configure some custom settings in your kibana.yml file. Open the yaml file in your favourite editor and add this on the end: # Custom Region Maps regionmap: layers: - name: Australian States url: http://localhost:8000/aus_state.geojson attribution: https://exploratory.io/maps fields: - name: STATE_NAME - description: State Name These settings are explained in of the documentation, but let's break it down here: The fields.name we define above is extremely important. It tells Kibana how to take the documents in Elasticsearch and then figure out which shapes in the geojson file it needs to place that document for the aggregation results to be displayed. We will run through this in detail and step by step, so don't worry if that's a bit confusing for right now. Serving Up The Geojson File The last step we need to do here is to serve the geojson file from a web server, as was defined in url. For this post we kept it simple and used this awesome little tool from npm, , and ran this command via our shell from the same directory that the aus_state.geojson lives in: http-server --cors=Access-Control-Allow-Origin -p 8000 You can test this by opening from your browser, and you should see the contents of the geojson file. There are also dedicated products to serve your geo-data, like that would be better suited for ongoing, production level deployments. We'd definitely recommend investigating them as more permanent solutions. Deploying Your Configuration It's all been pretty text heavy so far, especially for what is ultimately a visualisation. So let's take a look at what this all translates to in the Kibana UI. The Aggregation Settings As with most things in Kibana, we need to build an aggregation to show our data on our custom geoshapes. For this example dataset we will change the default Count Aggregation, under Metrics, to a Max and select the Population field to run the agg on. Then we Bucket the data using a Terms Aggregation on the Name field. Here's what we mean: Note - these are shown side-by-side to save room on this page. In Kibana they are displayed as a single vertical column on the left of the browser window. The Map Settings Now we have to tell Kibana to use the regionmap settings as we defined in the kibana.yml. Under the Options tab we need to change the default Vector map to point to the one we configured, this is where the name field from above is used, so select Australian States. After that we select the Join field, which you have probably already guessed, will show the fields.description value of State Name. Then we hit the Apply changes button and…. The Outcome Let's break and explain something we briefly touched on earlier, remember when we talked about the inner-join? Well, at this point Kibana has: However! Most importantly to all this is what happens between steps 2 and 3, where we run this inner join we keep referring to. Let's break that down as it's super important to understand if you want to start building these maps for your own datasets. Joint Pain In our aus_state.geojson file we have a number of geoshapes that represent the boundaries of the Australian States (and Territories). Each of those has a STATE_NAME field with, funnily enough, the name of the state/territory in it. For any custom regionmap to display the data correctly, we need a field in each document we've indexed into Elasticsearch to have one of the same values as is in the specified join field from the geojson file. Therefore, if we have these values in the geojson file: grep STATE_NAME aus_state.geojson STATE_NAME: New South Wales, STATE_NAME: Victoria, STATE_NAME: Queensland, STATE_NAME: South Australia, STATE_NAME: Western Australia, STATE_NAME: Tasmania, STATE_NAME: Northern Territory, STATE_NAME: Australian Capital Territory, STATE_NAME: Other Territories, And if we create a document in Elasticsearch like so: PUT aussiestates/doc/nsw { Name: New South Wales, Population: 7757800 } We then create a terms aggregation on the `Name` field (i.e. step 2). Kibana takes the value of the `Name` field and look in the geojson file against the field we defined in `fields.name`, which is `STATE_NAME`. Given it finds an exact match, it adds the max value in the `Population` field to the `New South Wales` bucket (i.e. step 3). Then it takes the custom geoshape and transposes it onto the tilemap service with the results of the aggregation. Then it repeats this over and over until all the documents have been processed and we have the final results being displayed (step 4). And what does that all translate to? Glorious, sunburnt Australia! Some places more than others : ) Troubleshooting space-not-tabs.yml One general point we will start with is that the configuration file for Kibana is yaml formatted. That means that spaces are super important for nested definitions like the one we use here. If you run into problems then it's always good to check your indentation is correct and uses spaces and not tabs. There's Nothing On My Map! If you've still gotten this far but your maps are still blank, remember that the values in the geojson file and the Elasticsearch document need to be the same. Not just similar, exactly the same. If you aren't already adjusting your mappings, make sure you do and set the field that contains the value to a to ensure it doesn't get split and/or lower-cased like a field would. Ultimately you need make sure your processing pipeline normalises the field values to the geojson file values. From 6.1 there will be improvements on how we handle failed joins. We will allow users to turn off warnings for failed joins, so when some terms aren't present in the geojson source, the dashboard won't be overloaded with error warnings. CORS Mate! CORS stands for Cross-Origin Resource Sharing and is an important feature of the browser's security model. CORS configurations define how a browser decides what content any given web-application has access to, based on the domain (or origin) of where that web application is hosted. The default behavior is simple: Browsers do NOT allow Javascript to load or post any content cross-origin. If this were so the browser would make data-theft very easy! After all, as administrators, we generally do not want a web-application to retrieve (possibly malicious) code or post (possibly private) content from or to another domain we do not control. Our users certainly would not want this! For example, javascript-code from the Kibana application hosted on the cannot execute a request to fetch files hosted on , as they are on two different domains. To work around this, servers can advertise which other domains they trust. This is a server's CORS configuration. As an example, the admins of accounting.example.org can configure the CORS-headers of their server so it will accept requests from the engineer.example.org domain. The browser's behavior would then look something like this: When users visit Kibana running on the on engineering.example.org domain, the browser will first sent a preliminary 'test'-request to accounting.example.org. When the server on accounting.example.org responds that it accepts requests from websites hosted on the engineering.example.org domain, the browser will then send the real request for the data to accounting.example.org. So why did we decide to require users to setup CORS? After all, why can't the Kibana server serve the geojson file so we can avoid this whole CORS business? The reason for this is that geodata comes from many different sources, and a lot of organisations already have an established stack for geo-data deployed, products like Geoserver or ArcGIS server. It is very common that these dedicated services would be running on a different domain from where Kibana will be hosted. And That's It! One last thing that we will note is that the Region map provides a zoom level of 10 by default. To increase the zoom levels, install X-Pack alongside Kibana and Elasticsearch and then register for for all 18 levels! We hope this has been useful and given you some great ideas on how to use custom maps with the flexibility built natively into the Elastic Stack. Imagine extending this to show voting levels based on your state/district/electoral areas, or grades for schools based on their coverage area, or even showing It's now even easier if you have the data. Feel free to head on over to if you have any further questions, or if you just want to share your awesome custom Region map successes, and thanks for reading.
Elastic APM enters Beta with new UI;;/blog/elastic-apm-beta-released;Rasmus Makwarth;December 12, 2017;Releases;; We're excited to share that Elastic APM is now in beta! In 6.0, we released the alpha, which included the open source APM Server and agents. With this release, we're now also providing you with a dedicated Kibana UI to easily visualize and debug performance bottlenecks and errors in your code. We want to help developers spend less time in the develop-test-deploy loop, and be able to ship code changes with confidence. To accomplish that, we've designed an UI that is designed specifically for the developers that wrote the code. The UI is available for free via X-Pack with a Basic license, and you get it out of the box with the 6.1 release. Here's a preview of the UI in action: Zero UI configuration Once you install the Elastic APM agent library in your application, the application will automatically appear in the UI. Installing agents is as easy as installing the Elastic APM agent for your programming language and copy/pasting a few lines of configuration. The agent will automatically instrument your application and send performance data through to the APM Server. The APM Server will process and index the data into Elasticsearch. Visualize application bottlenecks APM monitors transactions and errors in your application. A transaction can be a request to your server or a batch job or a custom transaction type. Out of the box, you'll get response times, requests per minute, status codes per endpoint, plus the ability to dive into a specific request sample and get a complete waterfall view of what your application is spending its time on - like database queries, cache calls, external requests, etc. This lets you easily compare and debug fast responses to slow responses. For each incoming request and each application error you also get access to contextual information, like request header, user information, system values or custom data that you can manually attach to the request. Having access to application-level insights with just a few clicks will drastically decrease spent on debugging 500s, slow response times and crashes. Correlate APM with other data sources using dashboards If you're already using the Elastic Stack for logging and server-level metrics, you can easily import the prepacked APM dashboards that comes with the APM Server. This will enable you to easily grab APM specific visualizations and use those to correlate APM data with other data sources. To get the dashboards, run the following command in the APM Server 6.1 directory Get started today The UI is now available as part of Kibana and activated via X-Pack with a Basic license. It's free - and you can . If you have questions or feedback, please drop us a note in the .
Logstash Lines: Update for December 12, 2017;;/blog/logstash-lines-2017-12-07;Andrew Cholakian;December 12, 2017;The Logstash Lines;; Hello Logstashers! We're glad to present you all the past couple week's news in convenient digest form!We've now for resetting logging settings changed via the Logstash web API back to their defaults. We've added some much improvements fixes for Logstash's log4j logging. We will cap log files at 100MB per file, and gzip files as they're rolled over automatically. We're targeting 6.2 with this change. The Logstash HTTP input previously exhibited poor behavior when the queue was blocked. If a client connected and timed out, LS would not release the connection, but rather, block indefinitely, causing the client to potentially time out. The HTTP protocol doesn’t deal well with long running requests. This plugin will now either return a 429 (busy) error when Logstash is backlogged, or it will time out the request.If a 429 error is encountered clients should sleep, backing off exponentially with some random jitter, then retry their request.This plugin will block if the Logstash queue is blocked and there are available HTTP input threads. This will cause most HTTP clients to time out. Sent events will still be processed in this case. This behavior is not optimal and will be changed in a future release. In the future, this plugin will always return a 429 if the queue is busy, and will not time out in the event of a busy queue.While doing this work, we discovered that the HTTP input had some synchronization bottlenecks which were unnecessary. Users might see a nice perf boost on multicore systems.One of the big changes in 6.0 is our work on a new execution backend as well as a new IR for our compiler. We've seen a number of small discrepancies pop up in the 6.x series that we've been ironing out with both the new IR (used by all users) and the new execution engine (will be optional with 6.1). We've done this work in a .
This Week in Elasticsearch and Apache Lucene - 2017-12-11;;/blog/this-week-in-elasticsearch-and-apache-lucene-2017-12-11;Clinton Gormley;December 11, 2017;This week in Elasticsearch and Apache Lucene;; Welcome to ! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources. A new dynamic cluster setting in 7.0 tracks the number of buckets created in an aggregation and fails the request if this number reaches the limit. The default for 7.0 is set to 10000, this means that any request that tries to return more than this number will fail. The number of buckets is checked in the coordinating node and also in each shard when we build the response. This setting should help to reject bad requests that were not caught by the circuit breaker but could push memory use over the limit. This setting will also be present in 6.2 but it will be disabled by default. Requests in this version that hit the default limit (10,000) will log a deprecation in order to prepare users for the migration to 7.0. X-Pack security will now filter the mappings fields returned by get index, get mappings, get field mappings and field capabilities APIs. This means that fields that a user cannot access due to field-level security will no longer be returned from these APIs. Since 5.0, it has been possible to remove index or cluster settings by setting them to . This should have worked for wildcard settings too (eg ) but a bug prevented that from working correctly. Finally, unknown settings from 2.x indices/cluster were moved to the namespace, but it was impossible to delete settings from indices as this was rewritten to . Both of these issues have been . The jvm.options file syntax has changed in 6.2 in order to support a breaking change in command line arguments in Java 9. Each option now needs to be preceded by a . Apache Lucene We are fixing before building the first release candidate. We should hopefully have a release in the coming weeks. was merged, allowing for great speedups when only the top matches are needed. This optimization requires scorers to expose a maximum score that they may contribute and currently only works well with BM25, but we are working on to work well with this optimization and . We are also looking into how we could , or maybe even the per-norm per-term maximum term frequencies in order to , which would in-turm make the optimization more efficient. It turns out the same API could be used to , when the term term frequency of the most frequent term is not high enough to produce a competitive score. At index time, documents are first buffered in an in-memory . Seeing it as a single buffer is a bit of a simplification though: there is actually a set of index buffers. This helps with concurrency since different threads can write to different index buffers concurrently. When refreshing, each index buffer writes a segment.In order to make multi-tenancy easy, Elasticsearch adds an abstraction layer on top of Lucene called which tries to make sure that each shard can use as much memory as possible for these buffers at index-time, because it makes indexing faster, while also ensuring that the total amount of memory that is spent on the index buffers across shards does not exceed . The issue is that when this shared limit is reached, Elasticsearch tells Lucene to do a refresh, which writes _all_ per-thread buffers to disk. This new feature is a way to tell Lucene to release some of the memory it spends on indexing, only flushing one of the largest per-thread buffers. This means we will write larger index buffers on average, which should make indexing more efficient.
Keeping up with Kibana: This week in Kibana for December 11, 2017;;/blog/keeping-up-with-kibana-2017-12-11;Jim Goodwin;December 11, 2017;Kurrently in Kibana;; Welcome to This is a weekly series of posts on new developments in the Kibana project and any related learning resources and events. We’re proud to present 6.0 with accessibility improvements, one-click CSV exports, tighter access control with Dashboard Only Mode, and more. Learn more: — elastic (@elastic) Hi all, This blog is going to be a different style going forward, instead of being mostly a change log, I'm going to try to highlight some important topics in Kibana development each week. So, here we go... A while ago we decided to move towards React as our rendering technology and away from Angular. We've also been developing a new UI design for Kibana which we refer to as K7 (who knows what the 7 means...) and if you follow along in github.com there has been a frenzy of activity around it. This includes the creation of a set of reusable React components that we call EUI ( ) and a complete redesign of our CSS structure. In order to reach our goals we're going to start using EUI in Kibana and migrating our UIs to it during the 6.x series of releases. We've got a skin that closely mimics the 6.x design using the EUI components. This will allow us to continue feature development and keep back-porting sane. This week we'll be merging EUI into Kibana and upgrading to React 16, and then cleaning up any rendering issues that come up. Our plan is to have things stable quickly. You can read more at EUIfying Kibana . Earlier this year at ( at 1h28m ) a prototype of a new application in Kibana called Canvas. Canvas is a truly innovative presentation tool that integrates live data and analysis from Elasticsearch into info-graphic-like presentations. This week we have a technology preview of Canvas that you can as well as a Canvas blog so that you can follow along with all the updates.That's all for this week.Cheers,Jim
Welcome to the Elastic Advent Calendar! Looking back on Week One;;/blog/elastic-advent-calendar-2017-week-one;Aaron Aldrich;December 07, 2017;Engineering;; This year, the Elastic Engineering team wanted to do something special to celebrate the end of the 2017 Calendar year. Drawing heavy inspiration from the (fully in Japanese) and (in English), we’ve decided to join the tech-advent tradition. From the 1st through 25th of December the team is publishing a series of Elastic Stack tips each day at 00:00 UTC. And —as we are a distributed organization with community members from all corners of the globe— you’ll be seeing these threads in a variety of our native languages. Just this week we have contributions in English, Japanese, French, and Korean! If you want to be actively notified of new topics, subscribe to the ! We’ll also be posting weekly summary blog posts (like this one) if you prefer to follow along here, and we encourage you to join in the conversation either way: topics are open for feedback and questions and we’d love to hear from you. Here’s what we’ve posted from our first week It’s a great collection of content packed into some mighty small space, and we’d love to hear your feedback on the posts. Happy reading!
Brewing in Beats: Recursive file watching on macOS with Auditbeat;;/blog/brewing-in-beats-recursive-file-watching-on-macos-with-auditbeat;Monica Sarbu;December 07, 2017;Brewing in Beats;; Did you know that is already available? Try it and let us know what you think. If you are curious to see the Beats 6.0 in action, we just published the . This update covers the last two weeks Auditbeat: recursive file watching on macOSAuditbeat now supports . This functionality is based on the library. One drawback of FSEvents is that in the case of multiple events on the same file, they have coalesced in a single notification. The PR orders the set of actions in a single event to be meaningful depending if the file existed in the beat database and if it doesn’t exist anymore at the moment of processing the event. Packetbeat: add_kubernetes_metadataAfter Filebeat and Metricbeat, Packetbeat is the next in line to get Kubernetes support. The `add_kubernetes_metada` processor is now with the pods and enhance the events with Kubernetes metadata. This feature was merged into master and is scheduled to be released in 6.2. Packetbeat: Several TLS support enhancementsPacketbeat now includes a for the TLS data. It can also report the , which is defined as the time spent between first packet and completion of the handshake. Finally, it can now calculate for the client TLS sessions. The JA3 fingerprints are efficient for detecting malware or unauthorized applications. These features are merged into master and are scheduled to be released with 6.2. Filebeat: use the local timezone in the system moduleAn issue that we had in Filebeat modules was that the Ingest Node pipelines assume the incoming logs have timestamps in UTC. In 6.1, Elasticsearch is getting the ability to parse timestamp in the timezone specified by . We now , so the local timezone can be correctly used when decoding the timestamp. This feature will be present in 6.1 but disabled by default. Filebeat and Metricbeat modules for Logstash monitoringThe Filebeat module for Logstash was in time for 6.1. The Metricbeat module got a with basic event stats, also in time for 6.1. Other changes:Repository: elastic/beatsAffecting all BeatsChanges in master: Changes in 6.1: Changes in 6.0: MetricbeatChanges in master: Changes in 5.6: Changes in 6.1: Changes in 6.0: PacketbeatChanges in master: FilebeatChanges in master: HeartbeatChanges in master: ProcessorsChanges in master: TestingChanges in master: Changes in 6.1: Changes in 6.0: InfrastructureChanges in master: Changes in 6.1: PackagingChanges in master: Changes in 6.0: DocumentationChanges in master: Changes in 6.1: Changes in 6.0: Repository: elastic/gosigarChanges in master:
Canvas Technology Preview;;/blog/canvas-tech-preview;Rashid Khan;December 06, 2017;Engineering;; Hey. You there. Want a sneak peek at a product in progress? We showed off an early prototype of Canvas at Elastic{ON} 17 earlier this year and since then we’ve been hacking on translating the concepts from that prototype into something we can share with you. Let’s be clear: This is still early stuff. But we’re having enough fun with it in the lab that we want to share fun with you. Well, probably you anyway.Is this preview for me? Have you ever taken something apart with no intention of putting it back together, just to see how it works? Can you make a half passable drink in the absence of a half passable liquor cabinet? Would you ride your bike down a road just because you’ve never ridden a bike down that road before? Do you do things just to prove to yourself that you can? If you answered yes, or no, to any or all, of these questions, then Canvas might be for you. These rough cuts of Canvas are for those with a bit of curiosity, some imagination and a whole lot of crazy. One more question: Do you run everything and anything in production? If so, Canvas isn’t for you. Canvas is marching forward rapidly, so we recommend that you try it out in an environment that won’t anger the masses should a hiccup occur. Ok, what is it? Hold on, we’re getting ahead of ourselves. There’s a section on that later. In short, Canvas is a composable, extendable, creative space for live data. The goal of Canvas is to be flexible, and allow you to tweak all bits and pieces required to get to the result you desire. Canvas presents you with a blank page to which you can add a selection of elements. These elements can be connected to data and configured with a simple UI. Within the sidebar you can play with palettes, fonts, background, borders and more. When there’s a style you have to have, but there isn’t a button, Canvas gives you the option of using raw CSS. Canvas today can’t do everything (though we’re working on that!) but its capabilities go much deeper than what you see on the surface. Sometimes you need to manipulate the style, or even the data, in a way that no graphical interface can easily express. The Canvas interface is intentionally compact because many of the most interesting and powerful features of Canvas are contained within a small window at the bottom of the screen. Hidden behind a button that says is the nerve center of Canvas. The interfaces you touch in Canvas, are in fact manipulating the Canvas expression in this text box. The expression describes everything Canvas needs to do to create your element. When you change something in the sidebar, for example a color palette, Canvas updates that expression in the right place. Not only can Canvas update the expression, but you can too and Canvas will do its best to create an interface to whatever you type. For example if you add or change the palette in the expression, the sidebar will update to reflect the expression in turn. There isn’t an interface to every function and argument, but we’re steadily adding new ones. Canvas expressions are simple, but like any sufficiently powerful concept, there is a learning curve. At its heart it is a pipe based syntax, and the output of one function flows into the next: Here we’re using the function to search for the string “error” and retrieve the @timestamp and bytes fields for the most recent 100 document. Then we’re using the function to assign those fields to dimensions, and finally piping into to put the points on a simple chart. This is, of course, a very simple example, all of these functions have arguments we aren't using. Canvas has dozens of functions and many more capabilities including table transforms, type casting and sub-expressions. The best way to learn right now is to open the code screen and play with the sidebar to see how it manipulates the expression. If you really want to dig deep see the . What doesn’t it do? (aka: What’s coming?) See, I told you there was a section for this. Let’s get this out of the way. Elasticsearch aggregations: It doesn’t do those yet. While we could build a function to perform aggregations it wouldn’t change anything else about how Canvas works, so we figured we’d rather share Canvas with you sooner. Plus, we’re working to ensure that Canvas supports as soon as it’s ready. What it does support is working with, and charting, raw Elasticsearch documents, which is something we haven’t historically done in Kibana. That means you can do scatter style plots of low density data sets. We’re also working to integrate Canvas with X-Pack Reporting so you can email PDFs and print out hard copies of Canvas work pads. We have a proof of concept, we just need to get everything tied in, look for it soon! There’s a million other little features we’re working on, and a few big ones we’re still exploring, so you can expect to see many changes to Canvas in the near future. What a time to be alive. How do I learn more? We’ve put together a series of short videos and articles highlighting how Canvas works. Like Canvas itself these are in an ever changing state of forward progress so we’ve broken them off into a micro-site just for Canvas over at . How do I install it? Pop over to for installation instructions. Canvas installs like any other Kibana plugin, but there’s a couple things you should know. Plus that site always has install instruction for the latest and greatest Canvas build. How do I give you feedback? In the lower right hand corner of the application you’ll find a “Give Feedback” button. We’d love to hear about how you’re using Canvas, what you’re making and what you’d like to be able to do. If you don’t need a response and just want to let us know about something, submit feedback there. Otherwise, if you have a question, join us in the of
Developing new Kibana visualizations;Creating new Kibana visualizations and embedding them into your own plugin;/blog/developing-new-kibana-visualizations;Peter Pišljar;December 06, 2017;Engineering;; Creating new visualization typesIn 6.0 we made some significant changes to the visualize API and how visualizations are implemented. We also which should provide basic description of the interface. This tutorial should help you get started quickly. Let us discuss some terms firstWe will build a visualization similar to Kibana's built-in Metric visualization, except we will try to keep it simple without any advanced options. When you start with creating new metric visualization in Kibana you are presented with a screen looking something like this: On the left side we have the default editor and on the right side there is the actual visualization. If you look at the side editor you notice it consists of few parts, which are numbered in the image above. Bootstrapping the pluginNew visualizations are just Kibana plugins with the right . We won't go over all the details. Instead, let's start with creating a folder inside Kibana's plugins directory. We will use a super smart name of 'test_vis'. { name: test_vis, version: kibana } export default function (kibana) { return new kibana.Plugin({ uiExports: { visTypes: [ 'plugins/test_vis/test_vis' ] } }): } Visualization definitionThe file referenced in above file will define a new visualization type and register it. As it is there are multiple factories depending on the rendering technology you are using. You could even extend it with your own. But in this tutorial we will use the , which is the one we recommend to use. Base visualization type does not make any assumptions about the rendering technology. Let's look at our example . First we import our styles and option template (we will talk about it later) as well as our visualization controller, which will take care of rendering the visualization. import './test_vis.less': import optionsTemplate from './options_template.html': import { VisController } from './vis_controller': import { CATEGORY } from 'ui/vis/vis_category': import { VisFactoryProvider } from 'ui/vis/vis_factory': import { VisTypesRegistryProvider } from 'ui/registry/vis_types': import { VisSchemasProvider } from 'ui/vis/editors/default/schemas': function TestVisProvider(Private) { const VisFactory = Private(VisFactoryProvider): const Schemas = Private(VisSchemasProvider): } The method of the accepts the definition object. Take a look at the documentation to get a better idea of what these properties do. The important parts are that we define the name for our visualization, the controller class which will take care of rendering, the default configuration of the visualization itself and the configuration for the editor. return VisFactory.createBaseVisualization({ name: 'test_vis', title: 'Test Vis', icon: 'fa fa-gear', description: 'test vuis', category: CATEGORY.OTHER, visualization: VisController, visConfig: { defaults: { // add default parameters fontSize: '30' }, }, We will be using the default Kibana editor in this tutorial. This is the side editor you see in many Kibana visualizations. We need to provide the , which should be the angular template for the options tab. We also need to provide definition, which tells which aggregations can be configured. In the below example our schema definition contains a single object of group metrics. The minimum is set to one (so users will have to configure at least one metric), some aggregations are excluded from the list and the default configuration is provided. editorConfig: { optionsTemplate: optionsTemplate, schemas: new Schemas([ { group: 'metrics', name: 'metric', title: 'Metric', min: 1, aggFilter: ['!derivative', '!geo_centroid'], defaults: [ { type: 'count', schema: 'metric' } ] } ]), } }): } At the end we need to register our provider function with the . // register the provider with the visTypes registry VisTypesRegistryProvider.register(TestVisProvider): Visualization OptionsIn the visualization definition we set the default options for the visualization. In this case, . We could provide more configuration options there and nest them as we like. We also need to provide the UI for changing them. The property on the allows us to provide an angular template to do just that. We could also provide . Our looks something like this. Note how we reference the parameter with the variable. <div class=form-group> <label>Font Size - {{ vis.params.fontSize }}pt</label> <input type=range ng-model=vis.params.fontSize class=form-control min=12 max=120 /> </div> But what if we need more control over it ? Instead of providing HTML we could also provide an angular directive: import './options_directive': … optionsTemplate: '<myvis_options_directive></myvis_options_directive>' ... Visualization ControllerThe last missing part is the visualization controller. This is the actual code that will render the visualization to the screen. We need to create a class with and functions. The constructor will get the DOM element to which visualization should be rendered and the object. It should prepare everything it can before the data is available. class VisController { constructor(el, vis) { this.vis = vis: this.el = el: this.container = document.createElement('div'): this.container.className = 'myvis-container-div': this.el.appendChild(this.container): } The function needs to clean up. Here we just remove all the DOM. we should also remove any hanging listeners or pending timers. destroy() { this.el.innerHTML = '': } The render method will receive the data object along with status object. It will be called every time a change happens which requires an update of visualization like changing time range, filters, query, uiState, aggregation, container size or visualization configuration. Here we re-render whole visualization every time, but this is not the most optimal behaviour. Your code could inspect the object to find out what exactly triggered the render call (was it a change in time range for example?) and update accordingly to that. For example a change in container size should probably not require to redraw the whole thing. render(visData, status) { this.container.innerHTML = '': return new Promise(resolve => {. resolve('when done rendering'): }): } }: export { VisController }: Now we need to provide our visualization implementation. First, we’ll extract the data. Note how we can’t rely on to be present, but we should always check if it is and use to correctly format the value in such case. As we didn’t provide a to our visualization it will use the default response handler, which returns data in tabular format. const table = visData.tables[0]: const metrics = []: table.columns.forEach((column, i) => { const value = table.rows[0][i]: metrics.push({ title: column.title, value: value, formattedValue: column.aggConfig ? column.aggConfig.fieldFormatter('text')(value) : value, aggConfig: column.aggConfig }): }): And at last we add the elements to the DOM: metrics.forEach(metric => { const metricDiv = document.createElement('div'): metricDiv.className = 'myvis-metric-div': metricDiv.innerHTML = `<b>${metric.title}:</b> ${metric.formattedValue}`: this.container.appendChild(metricDiv): }): Using parametersIn our visualization definition we defined the parameter. We can access it inside the visualization thru : metrics.forEach(metric => { const metricDiv = document.createElement('div'): metricDiv.className = 'myvis-metric-div': metricDiv.innerHTML = `<b>${metric.title}:</b> ${metric.formattedValue}`: metricDiv.setAttribute('style', `font-size: ${this.vis.params.fontSize}pt`): this.container.appendChild(metricDiv): }): Adding bucket configurationMany kibana visualizations allow you to define a bucket aggregation to then show you a metric for every bucket. For example, this is especially useful with date histograms where each date could be one bucket. To add bucket support to our aggregation we first need to tell the editor that we support buckets: editorConfig: { optionsTemplate: optionsTemplate, schemas: new Schemas([ { group: 'metrics', name: 'metric', title: 'Metric', min: 1, aggFilter: ['!derivative', '!geo_centroid'], defaults: [ { type: 'count', schema: 'metric' } ] }, { group: 'buckets', name: 'segment', title: 'Bucket Split', min: 0, max: 1, aggFilter: ['!geohash_grid', '!filter'] } ]), } }): } And we need to handle them in our visualization controller's render method: const table = visData.tables[0]: const metrics = []: let bucketAgg: table.columns.forEach((column, i) => { // we have multiple rows … first column is a bucket agg if (table.rows.length > 1 && i == 0) { bucketAgg = column.aggConfig: return: } table.rows.forEach(row => { const value = row[i]: metrics.push({ title: bucketAgg ? `${row[0]} ${column.title}` : column.title, value: row[i], formattedValue: column.aggConfig ? column.aggConfig.fieldFormatter('text')(value) : value, bucketValue: bucketAgg ? row[0] : null, aggConfig: column.aggConfig }): }): }): Adding eventsWhat about handling click events? Easy! We can add a click handler to our DOM elements: metrics.forEach(metric => { const metricDiv = document.createElement('div'): metricDiv.className = 'myvis-metric-div': metricDiv.innerHTML = `<b>${metric.title}:</b> ${metric.formattedValue}`: metricDiv.setAttribute('style', `font-size: ${this.vis.params.fontSize}pt`): metricDiv.addEventListener('click', () => { if (!bucketAgg) return: const filter = bucketAgg.createFilter(metric.bucketValue): this.vis.API.queryFilter.addFilters(filter): }): this.container.appendChild(metricDiv): }): When the click event fires, we create a filter for the selected value. We then add this filter to the filter-bar using the method. Using kibana visualizations in your pluginIn 6.0 using existing kibana visualizations in your own plugins has become much easier. We also around it. So let’s create a new plugin. There is already a community resource available on , so we are not going in depth on that. Let’s quickly review the steps. file will create a new plugin object and define the for the plugin. In we will define our app. If we want to be able to use existing Kibana visualizations we need to tell Kibana which modules we will use. Also we need to inject some variables from Kibana. Here is the example file for my plugin: export default function (kibana) { return new kibana.Plugin({ uiExports: { app: { title: 'Test Visualize', description: 'This is a sample plugin', main: 'plugins/test_visualize_app/test_vis_app', uses: [ 'visTypes', 'visResponseHandlers', 'visRequestHandlers', 'visEditorTypes', 'savedObjectTypes', 'spyModes', 'fieldFormats', ], injectVars: (server) => { return server.plugins.kibana.injectVars(server): } } } }): } Here is a minimal example of : require('ui/autoload/all'): require('ui/routes').enable(): require('ui/chrome'): import './test_vis_app.less': import './test_vis_app_controller.js': const app = require('ui/modules').get('apps/test_app', []): require('ui/routes').when('/', { template: require('./test_vis_app.html'), reloadOnSearch: false, }): We will come back to the template HTML file, the .less file for the styles and the controller file later. Embedding saved visualizations in your pluginOur first application will be really simple. It will have a select control with all the saved visualizations on top and it will render the selected visualization in the main area.Getting a list of saved visualizationsLet's create the . This will define a simple angular controller holds the main logic of our application. First we need to load the which will help us get the list of saved visualizations as well as embedding those visualizations to our DOM. To load saved visualizations we will use method, which will return a list of all saved visualizations. Each object in the list will contain: import { getVisualizeLoader } from 'ui/visualize/loader': const app = require('ui/modules').get('apps/kibana_sample_plugin', []): app.controller('TestVisApp', function ($scope) { $scope.visualizationList = null: $scope.selectedVisualization = null: let visualizeLoader = null: getVisualizeLoader().then(loader => { visualizeLoader = loader: loader.getVisualizationList().then(list => { $scope.visualizationList = list: }): }) }): We also need to add a template file to show our dropdown and prepare a placeholder where we will render visualizations. This is the file we referenced in the above. Note how we set to use the controller we just defined. <div class=test-vis-app app-container ng-controller=TestVisApp> <div class=test-vis-app-selector> <select ng-options=item.id as item.title for item in visualizationList ng-model=selectedVisualization ></select> </div> <div class=test-vis-app-visualize></div> </div> If there are any saved visualizations in this Kibana instance you should get the dropdown filled with them. The div with class will be used as a container where we'll load our visualizations. Its important that we set this div to use the flex display, else visualization will not render correctly, as well to set the flex display on visualization and visualize directives. Let's update our : .test-vis-app, .test-vis-app-visualize, visualize, visualization { display: flex: flex: 1 1 100%: } Embedding the visualizationTo embed the visualization we will also use the , more specifically its method, to which we need to provide the DOM element to which it should render, the visualization id as well as all the other parameters that you can pass to visualize directive. As we don’t have time picker in our plugin we will need to provide the time range. The accepts regular time stamps as well as the expressions. const visContainer = $('.test-vis-app-visualize'): const timeRange = { min: 'now-7d/d', max: 'now' }: $scope.$watch('selectedVisualization', (visualizationId) => { if (!visualizationId) return: visualizeLoader.embedVisualizationWithId(visContainer, visualizationId, { timeRange: timeRange }): }): And this is it. Reload Kibana and test it out. Passing additional options to the visualizationIn many scenarios you will have a requirement to pass additional options to visualization like the we mentioned above or a option. visualizeLoader.embedVisualizationWithId(visContainer, visualizationId, { timeRange: timeRange, showSpyPanel: false }): Take a look at to find out about additional options. How can I know when visualization is done rendering?The function will return a promise (WARNING: in 6.2 this behaviour will change and the method will return the handler directly). Once the promise is resolved the visualization is done rendering. The promise gets resolved with a handler object which has a destroy method which you should call when you want to clean up: visualizeLoader.embedVisualizationWithId(visContainer, visualizationId, { timeRange: timeRange }).then(handler => { console.log('done rendering')): …. handler.destroy(): // call to clean up }): Embedding a saved objectSometimes you will need to have more control over the saved visualization (maybe you want to modify it slightly prior to rendering, add additional filters or apply a query) or you might not have a saved visualization at all but have used a different way to obtain a saved object. Let's assume you want to load a saved visualization but modify its prior to rendering it. We will use a service to load the visualization and then method to embed it into our DOM. import { FilterManagerProvider } from 'ui/filter_manager': app.controller('TestVisApp', function ($scope, Private, savedVisualizations) { const filterManager = Private(FilterManagerProvider): $scope.$watch('selectedVisualization', (visualizationId) => { if (!visualizationId) return: savedVisualizations.get(savedVisualizationId).then(savedObj => { const filters = filterManager.generate('response.raw', '200'): savedObj.searchSource.get('filter').push(filters[0]): visualizeLoader.embedVisualizationWithSavedObject(visContainer, savedObj, { timeRange: timeRange }): }): }): }): Using kibana visualizationsAbove we looked into how we can render a saved kibana visualization inside our plugin. But what about using kibana visualization types with our own data and configuration, without actually saving anything? For this purpose we are going to use directive. We will need to import and : import 'ui/visualize': import { VisProvider } from 'ui/vis': In this example we will render a simple tag cloud. Tag cloud uses the response handler, which means we need to provide it data in such format. Lets create our data structure: $scope.myVisData = { tables: [{ columns: [ { title: 'Tag' }, { title: 'Count' } ], rows: [ [ 'test', 100 ], [ 'tag', 150 ], [ 'for', 200 ], [ 'tagcloud', 10 ], ] }] }: As you can see the data structure is very simple. On top level there is an object with tables property. There can be multiple tables listed but for this example we use just one. Each table has property which is an array of columns. Each column object has a property. Each table also has property, which is an array of rows, where each row is an array of cell values. We will also need to provide the configuration for visualization. In this example we are gonna keep it to the minimum: const visConfig = { type: 'tagcloud' }: Now we can create our visualization object: const Vis = Private(VisProvider): $scope.myVis = new Vis('logstash-*', visConfig): All we are missing is the template to render this: <visualization vis=myVis vis-data=myVisData></visualization> Or a similar example with region maps? We first need to add the dependency to our controller: app.controller('TestVisApp', function ($scope, Private, savedVisualizations, serviceSettings) { The service is used to load the layers that the map can use. Now we can load the available region map layers: serviceSettings.getFileLayers() .then(function (layersFromService) { And prepare our and data: $scope.myVisData2 = { tables: [{ columns: [ { title: 'Tag' }, { title: 'Count' } ], rows: [ [ 'GB', 100 ], [ 'FR', 150 ], [ 'DE', 200 ], [ 'ES', 10 ], ] }] }: const visConfig2 = { type: 'region_map', params: { selectedLayer: layersFromService[1], selectedJoinField:layersFromService[1].fields[0] } }: $scope.myVis2 = new Vis('.logstash*', visConfig2): }): <visualization ng-if=myVis2 vis=myVis2 vis-data=myVisData2></visualization> Where to go from here?You can get all the code used above on github. is the first part, creating your own visualization. And is the second part, using Kibana visualizations in your own plugin. We linked to the in quite a few places above. And if you need additional help don’t hesitate to contact us on . However to go more in depth you will probably need to dive into .
Kibana 5.6.5 and 6.0.1 released;;/blog/kibana-5-6-5-and-6-0-1-released;Jim Goodwin;December 06, 2017;Releases;; Hello, and welcome to the 5.6.5 and 6.0.1 release of Kibana! These releases of Kibana include an important security fix, we recommend that you upgrade either to 5.6.5 or 6.0.1 to correct the problem.Kibana 5.6.5 and 6.0.1 are available on our and on . Please review the release notes for and for the rest of the enhancements and bug fixes.
Elasticsearch at RTE: Blackout Prevention through Weather Prediction;;/blog/elasticsearch-at-rte-blackout-prevention-through-weather-prediction;Akli Rahmoun;November 27, 2017;User Stories;; About RTEAt the core of the power system, RTE (Réseau de transport d’électricité) keeps the balance between power consumption and generation. Twenty-four hours a day and seven days a week, we play a key role in directing the flow of electricity and maximizing power system efficiency for our customers and the community. We convey electricity throughout mainland France, from power generation facilities to industrial consumers who are connected to the transmission grid, and to the distribution grid which provide the link between RTE and end users. We operate France's high and extra-high voltage transmission system, the biggest in Europe. Our Daily ChallengeThe electrical resistance of a power line causes it to produce more heat as the current it carries increases. If this heat is not sufficiently dissipated, the metal conductor in the line may soften to the extent that it sags under its own weight between supporting structures. If the line sags too low, a flash over to nearby objects (such as trees) may occur, causing a transient increase in current. Automatic protective relays detect the excessively high current and quickly disconnect the line, with the load previously carried by the line transferred to other lines. If the other lines do not have enough spare capacity to accommodate the extra current, their overload protection will react as well, causing a cascading failure. Eventually, this can lead to a widespread power outage (blackout), like the one that occurred in Northeastern and Midwestern United States and the Canadian province of Ontario on Thursday, August 14, 2003. This incident had major adverse effects on the proper functioning of the regional economy, administration, public services, and more generally, on people’s daily lives. Power plants went offline to prevent damage in the case of an overload, forcing homes and businesses to limit power usage. Some areas lost water pressure because pumps lacked power, causing potential contamination of the water supply. Railroad service, airports, gas stations, and oil refineries had to interrupt service due to lack of electricity. Cellular communication devices were disrupted and cable television systems were disabled. Large numbers of factories were closed in the affected area, and others outside the area were forced to close or slow work because of supply problems and the need to conserve energy while the grid was being stabilized. Unleashing the Power of Numerical Weather Prediction DataBasically, the problem we are trying to solve consists of dynamically determining the sag margin without violating clearance requirements. A way of solving this problem is Dynamic Line Rating (DLR). The DLR prediction model aims to answer this simple question: What is a transmission line’s maximum instantaneous current carrying capacity after accounting for the effects of weather (temperature, wind, and solar radiation) on thermal damage and line sag? To answer that question, we used data provided by Météo France, the French national meteorological service. This data is formatted into GRIB2 files that can be sourced from Météo France’s open data platform. is a file format for the storage and transport of gridded meteorological data, such as output from the Numerical Weather Prediction model. It is designed to be self-describing, compact, and portable across computer architectures. The GRIB standard was designed and is maintained by the World Meteorological Organization. How did the Elastic Stack Help us Respond to the Challenge?The goal of the POC was to provide the easiest and most powerful access to this weather data to end users. We found out that Elasticsearch’s powerful ingest capabilities, geo indexing, and query features could help us achieve our goal efficiently and at scale both in terms of throughput and storage size. Let’s go deeper in how we built the data processing stream using the Elastic Stack. Architecture: Elasticsearch and Logstash Combined with Kafka The data processing pipeline consists of four stages: First, we extracted the needed data from the GRIB2 file to a flat CSV file. For this, we have developed custom code to wrap a CLI utility called , provided by the Climate Prediction Center of the US National Weather Service. Next, we buffered each piece of data by pushing it into a Kafka topic. Then we read messages from the Kafka topic using the Logstash input plugin to extract the desired fields and build the JSON document according to our Elasticsearch mapping specification. We have also indexed the geographical location of the power lines using the geoshape datatype. _all: { enabled: false }, _source: { enabled: true, excludes: [rid, debut, fin, location] }, properties: { requestid: { type: keyword }, debut: { type: date, format: yyyy-MM-dd HH:mm:ss }, fin: { type: date, format: yyyy-MM-dd HH:mm:ss }, code: { type: keyword }, valeur: { coerce: false, type: scaled_float, scaling_factor: 1000 }, location: { type: geo_shape, tree: quadtree, precision: 1100m } } Finally, the data could easily be accessed using an Elasticsearch query combining the weather data and the geographical location of the power lines. Key Benefits of Using the Elastic Stack and Kafka Key facts of the performance test Tuning correctly the weather data document mapping with the correct index options helped us reduce the index storage size and the document unitary size by 25%. We have found no memory leaks, even during heavy indexing process. Stopping indexing has made memory turn back to a standard constant value. With correctly configured topic partitions on Kafka, we have tested the Logstash scaling option and we found that having two Logstash pipelines running instead of one made the ingestion process run twice as fast. With only one Logstash pipeline, we reached an ingesting rate of 275,000 documents per minute, adding a second Logstash pipeline doubled the throughput. Filtering millions of documents by intersecting two complex geoshapes in two different indexes took only two seconds per request, which is a very satisfying response time. Outlook: Providing a Robust yet Easy to Use Platform for Weather Data AccessThe POC has demonstrated the feasibility of ingesting millions of records, combining them with geographical locations in a query, and sending back a ready to use data set for the end user, with great indexing and querying performance. Our next challenge is to build a scalable (billions of records over multiple weather sources), easy to use, robust platform for grid experts to access weather data. The goal is to provide a powerful tool to discover, experiment with, and build new prediction models or improve the accuracy of existing ones. Eventually, these models will go into production to more accurately predict the effects of the weather on our transmission system assets and help us optimize their availability for the benefit of our customers and the community. Using X-Pack Machine Learning and AlertingThe Elastic Stack combined with the X-Pack alerting and machine learning features can help us identify when the weather predictions values associated with other metrics make the line to deviate from its normal clearance range and automatically alert on risks of thermal damage and line sag.
Shipping Kubernetes Logs to Elasticsearch with Filebeat;Shipping Kubernetes logs with Filebeat;/blog/shipping-kubernetes-logs-to-elasticsearch-with-filebeat;Carlos Pérez-Aradros;November 27, 2017;Engineering;; We recently wrote about the new Filebeat features to , and since the 6.0 release, you can leverage the same technology when running Kubernetes. Metadata is key When shipping logs from containers infrastructure it’s important to include context metadata to ensure we can correlate logs later. This becomes especially important for the Kubernetes case. You may want to watch logs from a full deployment, a namespace, pods with a specific label, or just a single container. Metadata is key to ensure you can filter logs to focus on what’s important to you. Metadata is also useful to correlate events from different sources. When troubleshooting an issue it’s very common to check logs and metrics together, thanks to Kubernetes metadata we can filter both at the same time. Add Kubernetes metadata We use processors across all Beats to modify events before sending them to Elasticsearch, some of them are used to add metadata, as part of the 6.0.0 release we added to the list! enriches logs with metadata from the source container, it adds pod name, container name, and image, Kubernetes labels and, optionally, annotations. It works by watching Kubernetes API for pod events to build a local cache of running containers. When a new log line is read, it gets enriched with metadata from the local cache. Deployment Shipping logs from Kubernetes with Filebeat is pretty straightforward, we provide to do it. Filebeat is deployed as a , this ensures one agent is running on every Kubernetes node. Docker logs folder from the host is mounted in the Filebeat container, Filebeat tails all container logs and enriches them using . To deploy and see it yourself, just follow these simple steps: Logs will start flowing into Elasticsearch, enriched with Kubernetes metadata! You can now use it to filter logs: and try it yourself.
Elastic Cloud Enterprise 1.1.1 released;;/blog/elastic-cloud-enterprise-1-1-1-released;Suyog Rao;November 27, 2017;Releases;; We are happy to announce the availability of ECE 1.1.1, a maintenance release. Please see the here or head straight to our to get it! Bug fixesECE 1.1.1 corrects an important upgrade issue that we discovered and last week. Before this release, upgrading from 1.0.2 to 1.1.0 and subsequently restarting allocators or adding more capacity could lead to authentication errors in Kibana. This upgrade issue has now been fixed in 1.1.1. In addition to this bug fix, we added a new configuration parameter which allows you to control timeout values during installation. Existing ECE installations can be live upgraded to version 1.1.1.
Elastic Cloud Enterprise 1.1.0 upgrade issues and workaround;;/blog/elastic-cloud-enterprise-1-1-0-upgrade-issues;Suyog Rao;November 22, 2017;Engineering;; Last week we with new features and bug fixes. Since then, we've discovered a critical bug that leads to Kibana being unavailable post-upgrade. This bug only affects users who upgraded from 1.0.x versions of ECE to 1.1.0. IssueIf you have upgraded to ECE 1.1.0 and any of the following are true: If you have done those things, you will see Kibana authentication issues and unavailability. Monitoring will also be affected. Immediate WorkaroundAs an immediate workaround, we've created a script that can fix this issue on your deployment, which is . You can run this script anywhere that has access to the coordinator node on port . The script requires two pieces of information: Here's what it looks like:[user@localhost ~]$ curl -O https://download.elastic.co/cloud/fix-ece-1.1.0-upgrade.sh [user@localhost ~]$ chmod a+x ./fix-ece-1.1.0-upgrade.sh [user@localhost ~]$ ./fix-ece-1.1.0-upgrade.sh Please enter IP address of the coordinator:<your ip here> Please enter root password to admin console:<your password here>The script will test connectivity, and if successful, make the changes. If not, it will let you know. After that has happened, it will prompt you to remove from every host. This is a manual step, which you can do with this command on every host running ECE:docker rm -f frc-services-forwarders-services-forwarder Once this is removed, ECE will automatically restart the above service with the patched configuration.Official patched version We are working on a 1.1.1 release that addresses this situation. We are targeting a 1.1.1 release early next week which will allow you to upgrade from 1.1.0 or any older versions.If you are planning to upgrade your existing deployment to 1.1.0, we strongly recommend to hold off until we release 1.1.1.We apologize for the inconvenience this may have caused and look forward to releasing 1.1.1 ASAP. Please contact us if you have any questions about this.
Why am I seeing bulk rejections in my Elasticsearch cluster?;;/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster;Christian Dahlqvist;November 22, 2017;Engineering;; Elasticsearch supports a wide range of use-cases across our user base, and more and more of these rely on fast indexing to quickly get large amounts of data into Elasticsearch. Even though Elasticsearch is fast and index performance is continually improved, it is still possible to overwhelm it. At that point you typically see parts of bulk requests getting rejected. In this blog post we will look at the causes and how to avoid it. This is the second installment in a series of blog posts where we look at and discuss your common questions. The first installment discussed and provided guidelines around What happens when a bulk indexing request is sent to Elasticsearch? Let’s start at the beginning and look at what happens behind the scenes when a bulk indexing request is sent to Elasticsearch. When a bulk request arrives at a node in the cluster, it is, in its entirety, put on the bulk queue and processed by the threads in the thread pool. The node that receives the request is referred to as the coordinating node as it manages the life of the request and assembles the response. This can be a node dedicated to just coordinating requests or one of the data nodes in the cluster. A bulk request can contain documents destined for multiple indices and shards. The first processing step is therefore to split it up based on which shards the documents need to be routed to. Once this is done, each bulk sub-request is forwarded to the data node that holds the corresponding primary shard, and it is there enqueued on that node’s bulk queue. If there is no more space available on the queue, the coordinating node will be notified that the bulk sub-request has been rejected. The thread pool processes requests from the queue and documents are forwarded to replica shards as part of this processing. Once the sub-request has completed, a response is sent to the coordinating node. Once all sub-requests have completed or been rejected, a response is created and returned to the client. It is possible, and even likely, that only a portion of the documents within a bulk request might have been rejected. The reason Elasticsearch is designed with request queues of limited size is to protect the cluster from being overloaded, which increases stability and reliability. If there were no limits in place, clients could very easily bring a whole cluster down through bad or malicious behaviour. The limits that are in place have been set based on our extensive experience supporting Elasticsearch for different types of use-cases. When using the HTTP interface, requests that results in at least a partial rejection will return with response code 429, 'Too many requests'. The principle also applies when the transport protocol is used, although the protocol and interface naturally is different. Applications and clients may report these errors back to the user in different ways, and some may even attempt to handle this automatically by retrying any rejected documents. How can we test this in practice? In order to illustrate the practical impact of this behaviour, we devised a simple test where we use to run bulk indexing requests against a couple of with varying number of data nodes. Configuration and instructions on how to run Rally is available in . The same indexing workload was run against three different Elastic Cloud clusters. We have been indexing with one replica shard configured wherever possible. The clusters consisted of one, two and three data nodes respectively, with each data node having 8GB RAM (4GB heap for Elasticsearch, 4GB native memory). Invoking the API we could see that each data node by default had a fixed bulk thread pool size of two with a queue size of 200: %> curl -XGET http://<es_url>:<es_port>/_nodes/thread_pool</es_port></es_url> bulk: { type: fixed, min: 2, max: 2, queue_size: 200 } During the test we indexed into a varying number of shards (2, 4, 8, 16, and 32) using a varying number of concurrent clients (8, 16, 24, 32, 48, and 64) for each cluster. For every combination of shard and client count we indexed 6.4 million documents with a batch size of 100 documents and another 6.4 million documents with a batch size of 200 documents. This means that in total we attempted to index 384 million documents per cluster. For this test we treat the clusters as a black box, and perform the analysis from the client’s perspective. To limit the scope we will also not look at the impact of various configurations on performance as that is a quite large topic on its own. All the generated, detailed metrics were sent to a separate Elastic Cloud instance for analysis using Kibana. For each request Rally measures how many the documents in the bulk request were rejected and successful. Based on this data we can classify each request as successful, partially rejected, and fully rejected. A few requests also timed out, and these have also been included for completeness. Unlike Beats and Logstash, Rally does not retry failed indexing requests, so each has the same number of requests executed but the final number of documents indexed varied from run to run depending on the volume of rejections. How bulk rejection frequency depend on shard count, clients count, and data node count? Bulk rejections occur when the bulk queues fill up. The number of queue slots that get used depends both on the number of concurrent requests, and the number of shards being indexed into. To measure this correlation we have added a calculated metric, , to each run. This is defined as , multiplied by , and indicates how many queue slots would be needed to hold all bulk sub-requests. In the graph below, we show how the percentage of requests that result in partial or full rejections, depends on the client shard concurrency for the three different clusters.
Meet the New Elastic Cloud Enterprise UI!;;/blog/meet-the-new-ece-ui;Nicolás Bevacqua;November 21, 2017;Engineering;; A few months ago, Dave Snider (a design lead at Elastic) showed me the latest of what the design team had been working on since July. This was a minimalistic set of UI components that worked well together and, more importantly, had a coherent design. We’re calling it Elastic UI, or EUI for short. The designs are purely CSS based, while the interactive bits have been added in React, but in such a way that we could easily take just the CSS, or make a version of the components using a different JavaScript framework. As we were going through the designs I wasn’t merely impressed, I wanted Elastic Cloud Enterprise to leverage this component system right away! When the team met in Berlin, I was eager to discuss the possibility of releasing this as part of . A minor bump we faced was that EUI was at the time embedded in a branch of Kibana, and thus not readily available to other teams within Elastic that may have wanted to leverage all of the design work that went into it. This was because, at the time of its inception, Kibana was the only intended consumer of EUI. Thankfully they received us with open arms and we were able to extract EUI into its own repository the day after, so that everyone across the company could consume it. Armed with the new repository, we set out to redesign the entire Elastic Cloud Enterprise UI. The plan was to experiment a little and see how far we could get in a week’s time. The problem was that there was an upcoming release in two weeks, and virtually everyone in the team was taking one week vacations at some point before the release. Alas, we excitedly pushed forward. Experimenting with EUI The way React applications are typically built often encourages a highly componentized structure, meaning we would have our own share of reusable components like a , a to require confirmation for dangerous actions, code highlighting components, and so on. By changing these components’ implementations to use EUI, we were able to reimplement large portions of our codebase to rely on EUI without having to change the interfaces of our own reusable components – a huge time saver. Another benefit in doing such a large scale migration with React is that components are easy to swap. Where you had an , you now have an and an statement. Same thing with buttons. Suddenly the Login screen looks great! The first few bits of the redesign involved bringing in the new page layout, and replacing with EUI components all of the styles we had previously implemented ourselves. Things like spacing and responsiveness ended up being the more challenging aspects. The Platform page, for example, gives you an overview of how your entire fleet is doing. One of the nice aspects of EUI is that it has accessibility built into it, so we can be sure to provide an optimal experience to screen readers while using the slick new designs. We’re excited about and can’t wait to hear your feedback! .
Brewing in Beats: Autodiscovery with Docker;;/blog/brewing-in-beats-autodiscovery-with-docker;Monica Sarbu;November 21, 2017;Brewing in Beats;; Autodiscovery - use Docker events to auto-configure BeatsThe autodiscovery feature with the first provider (for Docker) has been . Autodiscovery allows the user to define different providers, that watch for system changes and emit events to a common bus. Then the autodiscovery module detects situations when there is something we can monitor and instantiates new modules for it. The first provider watches for Docker events. It supports config mapping from container metadata to config templates, so new modules are created when a container starts. metricbeat.autodiscover: providers: - type: docker templates: - condition: equals.docker.container.image: redis config: - module: redis metricsets: [info, keyspace] hosts: ${data.host}:${data.port} The above is an example configuration that instantiates the Metricbeat Redis module every time a new Redis container (defined by having the redis image) is started. Note that the connection information (host/ports) is filled in by the autodiscovery support via a template. This feature will be released in Metricbeat and Filebeat 6.1. Configure the number of routing shards in the Elasticsearch templateElasticsearch 6.1 will have an API for . To enable this feature, applications like Beats need to set the config option at index time creation. The actual number of shards must be a factor of the number of routing shards. This adds configuration option in Beats for the routing shards, with a default of 30. We chose 30 as it is a multiple of 1, 3 and 5, our current number of default shards in Beats and ES. The new configuration option will be present in Beats 6.1. Packetbeat: support for reading TLS envelopesThis PR adds , which is one of the most anticipated Packetbeat features. It doesn’t mean decrypting traffic, but it parses the initial handshake and extracts data like ciphers supported by the client and the server, the client and server certificate chains, the subject alternative name (SAN), validity dates, raw certificates, and so on. This data is super valuable for debugging TLS issues and also for intrusion detection and auditing. The implementation also comes with support for the extension to TLS, which allows Packetbeat to detect, for example, whether HTTP/2 or HTTP/1 are used as an application protocol on top of the TLS connection. This feature will be released in Packetbeat 6.1. Filebeat: Docker JSON-file prospectorThis PR adds an (experimental) written by the default JSON logging driver. Filebeat could already read Docker logs via the prospector with JSON decoding enabled, but this new prospector makes things easier for the user. It abstracts the format, so there is no need to manually configure JSON decoding. Here is an example config, which captures the logs from a single container specified by its ID: prospectors: - type: docker containers.ids: - c3ec7a0bd9640151a768663b7e78c115d5b1a7f87fba572666bacd8065893d41 It also parses the timestamp from the JSON file, something that wasn’t possible with Filebeat alone (it required Logstash or Ingest Node). This new prospector will be released with Filebeat 6.1. Other changesRepository: elastic/beatsAffecting all BeatsChanges in master: MetricbeatChanges in master: FilebeatChanges in master: HeartbeatChanges in master: AuditbeatChanges in master: TestingChanges in master: Changes in 6.0: DocumentationChanges in master: Changes in 5.6: Changes in 6.0: Repository: elastic/gosigarChanges in master:
Machine Learning Anomaly Scoring and Elasticsearch - How it Works;;/blog/machine-learning-anomaly-scoring-elasticsearch-how-it-works;Rich Collier;November 20, 2017;Engineering;; We often get questions about Elastic’s Machine Learning “anomaly score” and how the various scores presented in the dashboards relate to the “unusualness” of individual occurrences within the data set. It can be very helpful to understand how the anomaly score is manifested, what it depends on, and how one would use the score as an indicator for proactive alerting. This blog, while perhaps not the full definitive guide, will aim to explain as much practical information as possible about the way that Machine Learning (ML) does the scoring. The first thing to recognize is that there are three separate ways to think about (and ultimately score) “unusualness” - the scoring for an individual anomaly (a “record”), the scoring for an entity such as a user or IP address (an “influencer”), and the scoring for a window of time (a “bucket”). We will also see how these different scores relate to each other in a kind of hierarchy. Record Scoring The first type of scoring, at the lowest level of the hierarchy, is the absolute unusualness of a specific instance of something occurring. For example: Each of the above occurrences has a calculated probability, a value that is calculated very precisely (to a value as small as 1e-308) - based upon the observed past behavior which has constructed a baseline probability model for that item. However, this raw probability value, while certainly useful, can lack some contextual information like: Therefore, to make it easier for the user to understand and prioritize, ML normalizes the probability such that it ranks an item’s anomalousness on a scale from 0-100. This value is presented as the “anomaly score” in the UI. To provide further context, the UI attaches one of four “severity” labels to anomalies according to their score - “critical” for scores between 75 and 100, “major” for scores of 50 to 75, “minor” for 25 to 50, and “warning” for 0 to 25, with each severity denoted by a different color. Here we see two anomaly records displayed in the Single Metric Viewer, with the most anomalous record being a “critical” anomaly with a score of 90. The “Severity threshold” control above the table can be used to filter the table for higher severity anomalies, whilst the “Interval” control can be used to group the records to show the highest scoring record per hour or day. If we were to in ML’s API to ask for information about anomalies in a particular 5 minute time bucket (where was the name of the job): We would see the following output: Here we can see that during this 5-minute interval (the of the job) the record_score is 90.6954 (out of 100) and the raw is 1.75744e-11. What this is saying is that it is very unlikely that the volume of data in this particular 5 minute interval should have an actual rate of 179 documents because “typically” it is much lower, closer to 60. Notice how the values here map to what’s shown to the user in the UI. The value of 1.75744e-11 is a very small number, meaning it is very unlikely to have happened, but the scale of the number is non-intuitive. This is why projecting it onto a scale from 0 to 100 is more useful. The process by which this normalization happens is proprietary, but is roughly based on a quantile analysis in which probability values historically seen for anomalies in this job are ranked against each other. Simply put, the lowest probabilities historically for the job get the highest anomaly scores. A common misconception is that the anomaly score is directly related to the deviation articulated in the “description” column of the UI (here “3x higher”). The anomaly score is purely driven by the probability calculation. The “description” and even the value are simplified bits of contextual information in order to make the anomaly easier to understand. Influencer Scoring Now that we’ve discussed the concept of an individual record’s score, the second way to consider unusualness is to rank or score entities that may have contributed to an anomaly. In ML, we refer to these contributing entities as “influencers”. In the above example, the analysis was too simple to have influencers - since it was just a single time series. In more complex analyses, there are possibly ancillary fields that have impact on the existence of an anomaly. For example, in an analysis of a population of users’ internet activity, in which the ML job looks at unusual bytes sent and unusual domains visited, you could specify “user” as a possible influencer since that is the entity that is “causing” the anomaly to exist (something has to be sending those bytes to a destination domain). An influencer score will be given to each user, dependent on how anomalous each was considered in one or both of these areas (bytes sent and domains visited) during each time interval. The higher the influencer score, the more that entity will have contributed to, or is to blame for, the anomalies. This provide a powerful view into the ML results, particularly for jobs with more than one detector. Note that for all ML jobs, a built-in influencer called is always created in addition to any influencers added during creation of the job. This uses an aggregation of all the records in the bucket. In order to demonstrate an example of influencers, an ML job is set up with two detectors on a data set of API response time calls for an airline airfare quoting engine: with specified as an influencer. Taking a look at the results in the “Anomaly Explorer”: The top scoring influencers over the time span selected in the dashboard are listed in the “Top influencers” section on the left. For each influencer, the maximum influencer score (in any bucket) is displayed, together with the total influencer score over the dashboard time range (summed across all buckets). Here, airline “AAL” has the highest influencer score of 97, with a total influencer score sum of 184 over the whole time range. The main timeline is viewing the results by influencer and the highest scoring influencer airline is highlighted, again showing the score of 97. Note the scores shown in the “Anomalies” charts and table for airline AAL will be different to its influencer score, as they display the “record scores” of the individual anomalies. When querying the API at the influencer level: the following information is returned: The output contains a result for the influencing airline AAL, with the of 97.1547 mirroring the value displayed in the Anomaly Explorer UI (rounded to 97). The value of 6.56622e-40 is again the basis of the (before it gets normalized) - it takes into account the the probabilities of the individual anomalies that particular airline influences, and the degree to which it influences them. Note that the output also contains an of 98.5096, which was the score when the result was processed, before subsequent normalizations adjusted it slightly to 97.1547. This occurs because the ML job processes data in chronological order and never goes back to re-read older raw data to analyze/review it again. Also note that a second influencer, airline AWE, was also identified, but its influencer score is so low (rounded to 0) that it should be ignored in a practical sense. Because the is an aggregated view across multiple detectors, you will notice that the API does not return the actual or typical values for the count or the mean of response times. If you need to access this detailed information, then it is still available for the same time period as a record result, as shown before. Bucket Scoring The final way to score unusualness (at the top of the hierarchy) is to focus on time, in particular, the bucket_span of the ML job. Unusual things happen at specific times and it is possible that one or more (or many) items can be unusual together at the same time (within the same bucket). Therefore, the anomalousness of a time bucket is dependent on several things: Note that the calculation behind the bucket score is more complex than just a simple average of all the individual anomaly record scores, but will have a contribution from the influencer scores in each bucket. Referencing back to our ML job from the last example, with the two detectors: When looking at the “Anomaly Explorer” Notice that the “overall” lane in the “Anomaly timeline” at the top of the view displays the score for the bucket. Be careful, however. If the time range selected in the UI is broad, but the ML job’s is relatively short, then one “tile” on the UI may actually be multiple buckets aggregated together. The selected tile shown above has a score of 90, and that there are two critical record anomalies in this bucket, one for each detector with record scores of 98 and 95. When querying the API at the bucket level: the following information is present: Notice, especially, in the output the following: Using Anomaly Scores for Alerting So, if there are three fundamental scores (one for individual records, one for influencers, and one for the time bucket), then which would be useful for alerting? The answer is that it depends on what you are trying to accomplish and the granularity, and thus rate, of alerts that you wish to receive. If on one hand, you are attempting to detect and alert upon significant deviations in the overall data set as a function of time, then the bucket-based anomaly score is likely most useful to you. If you want to be alerted on the most unusual entities over time, then you should consider using . Or, if you are attempting to detect and alert upon the most unusual anomaly within a window of time, you might be better served by using the individual as the basis for your reporting or alerting. To avoid alert overload, we recommend using the bucket-based anomaly score because it is rate limited, meaning that you’ll never get more than 1 alert per bucket_span. On the other hand if you focus on alerting using the , the number of anomalous records per unit time is arbitrary - with the possibility that there could be many. Keep this in mind if you are alerting using the individual record scores. Additional reading: , Machine Learning Docs
Removal of Mapping Types in Elasticsearch 6.0;Removal of mapping types in Elasticsearch 6.0;/blog/removal-of-mapping-types-elasticsearch;Clinton Gormley;November 16, 2017;Engineering;;
Brewing in Beats: Monitor Logstash with Beats;;/blog/brewing-in-beats-monitor-logstash-with-beats;Monica Sarbu;November 16, 2017;Brewing in Beats;; Monitor Logstash with Metricbeat With this , Metricbeat gets a new module for monitoring Logstash. The module is in its early stages, more metrics will be added in future PRs. The module is experimental and it will be released in 6.1. Metricbeat: Export more Ceph informationThanks to , the Ceph module exports , that is important for understanding and monitoring the structure of a Ceph cluster. It collects the weight, status, primary_affinity and other useful info of each OSD node. Other changes Repository: elastic/beatsAffecting all BeatsChanges in master: Changes in 6.0: MetricbeatChanges in master: PacketbeatChanges in master: HeartbeatChanges in 6.0: TestingChanges in master: Changes in 6.0: DocumentationChanges in master: Changes in 5.6: Changes in 6.0:
Elasticsearch 6.0.0 GA released;Elasticsearch 6.0.0 GA released;/blog/elasticsearch-6-0-0-released;Clinton Gormley;November 14, 2017;Releases;; With 2236 pull requests by 333 commiters added since the release of Elasticsearch 5.0.0, we are proud to announce the release of , based on .A big thank you to all the who tested early versions and opened bug reports, and so helped to make this release as good as it is.
Beats 6.0.0 GA released;;/blog/beats-6-0-0-released;Monica Sarbu;November 14, 2017;Releases;; Today is a big day for Elastic and the entire community: the Elastic Stack 6.0 is now generally available (GA). In this blog post we’ll highlight the main new features of Beats 6.0. If you’ve been following the Alpha and Beta blog posts, you know already what’s in, so we won’t stay in your way. Go to the Beats product page to . Before upgrading from Beats 5.x, review the docs and the guide. If you are planning to upgrade the whole Elastic stack at once, we also have a for that. Logs and metrics out of Kubernetes and DockerFilebeat and Metricbeat have gained several processors and modules that make container observability with the Elastic stack a breeze. You can use the and processors to enhance application logs and metrics with Docker and Kubernetes metadata. These processors query the Docker and Kubernetes APIs and enhance the events with the container name, image, pod name, labels, and so on. Depending on the Beat, these processors can use different logic to obtain the metadata. For example Filebeat takes the path of the log file, extracts the container ID from it, and uses the ID to retrieve metadata about the container/pod from which the log message originated. See this blog post about . Metricbeat gets a new Kubernetes module, which works by interrogating periodically the Kubernetes API. It gives you details about the running container’s pods, including the CPU usage, memory usage, bytes exchanged over the network, and info about the file system. The sample Kibana dashboard provided with the module shows you at a glance the monitoring status of your Kubernetes cluster. We also provide Kubernetes deployment manifests for Metricbeat and Filebeat. You can find more details about deploying Filebeat and Metricbeat on Kubernetes in the docs, but as a sneak peak, it’s as easy as: curl -L -O https://raw.githubusercontent.com/elastic/beats/6.0/deploy/kubernetes/filebeat-kubernetes.yaml # edit the YAML file to set the Elasticsearch connection information kubectl create -f filebeat-kubernetes.yaml The above commands install Filebeat as a DaemonSet, ensuring one agent is running on each Kubernetes node, and configure Filebeat to pick up the logs from , unwrap them from the JSON objects, and automatically enhance them with Kubernetes metadata. Auditbeat - easy operational securityYou can think of Auditbeat as a friendlier version of that is perfectly integrated with the Elastic Stack. It is based on the Linux audit framework, which means it can hook into every system call and capture them on particular conditions. You can use Auditbeat to very efficiently detect things like short-lived connections and processes, unauthorised attempts to open files, privilege escalations, and so on. Auditbeat automatically correlates events together, resolves UIDs ito names, and outputs JSON objects directly to Elasticsearch or Logstash. Auditbeat also has a file integrity module. It watches files and directories for changes, and when a file changes it computes the MD5, SHA1, and SHA256 hashes and publishes them to Elasticsearch. The hashes can be compared against known malicious files, as shown in this . This functionality is available on Windows, macOS, and Linux. New commands and configuration layoutWe rethought the way the Metricbeat and Filebeat modules are enabled and configured. Instead of one huge configuration file, we now have a directory with individual configuration files for each module. The Beats also get commands to list, enable, or disable modules. For example: $ metricbeat modules list $ metricbeat modules enable redis $ metricbeat modules disable redis And these are not the only useful new commands. There are also commands to export the configuration, export the Elasticsearch mapping template, do a test fetch for a Metricbeat module, and even test the connectivity with Logstash or Elasticsearch. More efficient Metricbeat storageWe took a good look at the data that Metricbeat generates and how it is stored into Elasticsearch, and worked our way towards using considerably less storage while still providing the same, or almost the same, value. We have changed the number of default shards to 1 for the Metricbeat indices, because the amount of data typically doesn’t require more. We have added a new feature that captures only the first N processes by memory and CPU time, instead of all processes, which significantly reduces the number of documents created. And we have reduced the polling frequency for the more static metricsets, like the filesystem one. These changes are complemented nicely by the storage efficiency improvements that went into Elasticsearch 6.0, especially the . Together, the improvements add up to 85% less storage used by Metricbeat 6.0 in default configuration compared to Metricbeat 5.5. Internal pipeline refactoring and better performanceBeats 6.0 also comes with a refreshed internal pipeline architecture. While this change is mostly internal, meant to simplify and improve the Beats overall architecture, it does have some visible effects. The new pipeline is asynchronous by default, meaning, for example, that while Filebeat is waiting for a network acknowledgement from Logstash/Elasticsearch, it continues to read and process lines from disk. This brings an increase in the maximum throughput. The Filebeat internal spooler is removed, as its functionality is covered by the new pipeline. This means that tuning Filebeat for performance is easier (only one internal queue size to play with). Be aware that this also implies some breaking changes in the configuration file. Another effect of the pipeline refactoring is that you can no longer enable two outputs at the same time. In previous versions of the Beats, it was possible to enable simultaneously multiple outputs of different types (e.g. one Logstash and on Elasticsearch), but not multiple outputs of the same type. In a back-pressure sensitive shipper, like Filebeat, having multiple outputs generally means that the slowest output decides the rhythm. This is not ideal and often took our users by surprise. We therefore removed the possibility, which also simplified our internal architecture, and instead we recommend using multiple Filebeat instances or using Logstash as an intermediary with multiple outputs. Tell me moreBesides the highlights above, Beats 6.0 come with a ton of other small features, improvements, and modules. Please read the for details. Thank you and community creditsThe 6.0 release was a big undertaking, and it wouldn’t have been possible without the continuous support from our community. We’d like to thank everyone who has contributed code or documentation, but also everyone who has tried out the alpha and beta releases. We’d like nominate the following community members in particular: On behalf of the whole community: Thank you and enjoy 6.0!
Elastic Stack 6.0.0 GA is Released;;/blog/elastic-stack-6-0-0-released;Tyler Hannan;November 14, 2017;Releases;de-de,fr-fr,ja-jp,ko-kr,zh-chs; 6.0.0 is here. Not much more needs saying. You should download it now or use it on (your favourite hosted Elasticsearch and Kibana provider.) If you haven’t followed the release cadence over the past few months, you may be surprised by today’s announcements. Today represents the culmination of thousands of pull requests and the effort of hundreds of committers. This has culminated in two alpha releases, two beta releases, two release candidates, and – finally – general availability (GA). This milestone would have been impossible to achieve without the effort of a variety of teams within Elastic. And, importantly, the perspective and feedback of our users who chose to participate in the . Not only are we releasing the entirety of the Elastic Stack, today also marks the release of that includes 6.0 support, offline installation, and a variety of UX changes simplifying the provisioning, management, and monitoring of clusters. And, because just GA’ing multiple products on the same day isn’t enough…APM is still in Alpha and we invite your participation in testing it on 6.0.0. When there are so many features to highlight in a release, where do you even start? Either you craft the next great novel or you choose to provide links to details. Happy reading and…more importantly…happy searching, analyzing, and visualizing. We are also hosting a , with Shay and Elastic engineering leads. Join us for 6.0 demos, AMAs, and more during our live 6.0 launch celebrations. Elasticsearch An entirely new zero-downtime upgrade experience, the addition of fast, op-based recovery thanks to sequence IDs, improved handling of sparse data, faster query times, distributed Watch execution, and the list keeps going. The summary, or aggregation, of features is in the . Kibana A Dashboard Only mode, a Full Screen mode, the ability to export saved searches to .csv, Alert creation via a UI in X-Pack Gold and above, a migration assistant in X-Pack Basic, and all of it is made more accessible via contrast changes, keyboard accessibility enhancement and more. Visualize the future of interacting with your data in the . Logstash Multiple, self-contained pipelines in the same Logstash instance and the addition of UI components - Pipeline Viewer in X-Pack Basic and Logstash pipeline management in X-Pack Gold. Grok the details in the . Beats Beats <3 containers and, also, Beats <3 modules (and improving the dashboard for those modules). Combine this with a new commands and configuration layout generally and more efficient storage in Metricbeat. Also, say ‘Heya’ to Auditbeat. If you want all the details, ‘Go’ read the . ES-Hadoop First class support for Spark’s Structured Streaming has landed in 6.0, alongside a re-write of connector mapping code to better support multiple mappings. Support for reading and writing the new join fields has been added as well. Users can also now take advantage of script types other than inline for update operations. This is just a reduced view though, so map your eyes to the details in the . Get It Now!
Elasticsearch for Apache Hadoop 6.0.0 GA is Released;;/blog/es-hadoop-6-0-0-released;James Baiera;November 14, 2017;Releases;; Major releases don’t come every day, which is why I am astonishingly excited to announce the release of Elasticsearch for Apache Hadoop (aka ES-Hadoop) built and tested against the latest and greatest Elasticsearch 6.0.0. This release has been a culmination of a monumental effort across Elastic as well as our awesome community. A special thank you to all who checked out the preview releases and provided invaluable feedback on them. And now, on to the shiny new stuff! What’s new? Spark 2.2.0 and Stable Support for Spark Structured Streaming Spark 2.2.0 landed on July 11th and we spared no time in making sure we work seamlessly with it. What’s with all the excitement? Why, is no longer an “Experimental” feature in this release! This means that we’re treating our Structured Streaming integration in ES-Hadoop as an evolving integration as of this beta release. Please note that due to its experimental nature in prior versions, we will only be supporting our Structured Streaming integration on Spark versions 2.2.0 and above. Don’t fret though - this doesn’t impact our existing Spark integrations at all. RIP Elasticsearch on YARN The Elasticsearch on Apache YARN (ES-on-YARN) beta integration has been removed in this release. ES-on-YARN was an experiment for deploying Elasticsearch on top of Hadoop’s YARN cluster resource negotiator. The project was never recommended for production use and has been in perpetual beta status since its inception. The core limitations for the project have been YARN’s lack of formal support for long-running services, which is a requirement for Elasticsearch to achieve production level stability. The ecosystem around long-running services in YARN has improved since the start of the beta, but much of the improvement is based in systems that sit on top of YARN like Apache Slider. These systems are still fairly young and would require quite a bit of work to migrate toward. With all this in mind, we have decided to cease development of the ES-on-YARN project. We’re always eager to hear your feedback, so if you have any about ES-on-YARN make it known on the . Have no fear though. When one door closes, another one opens: for users looking to easily orchestrate and manage a fleet of Elasticsearch clusters, either on-prem or in the cloud, is the recommended solution. Support for new Join Fields The days are numbered for Multi-typed indices in Elasticsearch. Users who work with Parent-Child based data need not worry about the future due to the advent of the new “join” field type in Elasticsearch. We’ll be rolling out support for reading and writing data with this new field type in this release. We’re excited to hear your feedback on this new feature! Multiple Mappings and Multiple Index Reads We took a long hard look at how we handle Elasticsearch mappings in the connector. After that long hard look we re-wrote a healthy chunk of code to fix an unhealthy bunch of problems. In this release you will no longer be bitten by common errors when reading from multiple indices (each with varying field types). ES-Hadoop will also alert you when the indices you’re reading from have conflicting mappings in them. Check Out Our Bug Collection Nested Java Bean serialization problems, field exclusion problems on Pig and SparkSQL, partial document reads and serialization exceptions, parsing errors from index auto-creation, backwards compatibility errors with scroll id’s, missing support for timestamps in params and much more all fixed in this release. Take a look at in this release!
Kibana 6.0.0 is released;;/blog/kibana-6-0-0-released;Jim Goodwin;November 14, 2017;Releases;; With 1,280 pull requests by 208 contributors added since the release of Kibana 5.0.0, we are proud and happy to announce the release of . We'd like to thank all the who tested early versions and reported bugs helping to make this a great release of Kibana! This release has a lot of new features including: Did someone say CSV export? We’re pretty sure we heard someone ask for CSV export. Just to be safe, we built CSV export. Search for the documents you want to export in the Discover app, and then export matching documents as a CSV file via the reporting menu. CSV export comes with X-Pack basic, which is our free license. In 6.0 we made changes across Kibana to improve Accessibility, one of those efforts is to make the colors in the UI have appropriate contrast for folks who have different forms of color blindness. We've redone the styling for Kibana to address these issues. Here are some sample screens: We've also improved screen reading and keyboard navigation throughout Kibana: [Continue reading: ] We've introduced a new UI for creating and editing alerts based on thresholds. It includes a builder experience with type-ahead suggestions and graphical feedback based on previewing the alert constraints. It supports sending alert messages with template values to the log, email, or slack. See the demonstration animation below for a quick look at the new functionality: [] You can now enter full screen mode when viewing a dashboard. This hides the browser chrome and the top nav bar. If you have any filters applied, you'll see the filter bar, otherwise that will be hidden as well. To exit full screen mode, hover over and click the Kibana button on the lower left side of the page, or simply press the ESC key. This mode complements the Dashboard Only Mode introduced in alpha2, and together they make a great solution for monitors in NOCs, SOCs and other Kiosks around the office! Ever wish you could share your Kibana dashboards without the risk of someone accidentally deleting or modifying them? Do you want to show off your dashboards without the distraction of unrelated applications and links? In version 6.0 we’re making it easier than ever to set up a restricted access user, with limited visibility into Kibana. It’s already possible to create read only users, but new in 6.0 is a UI to match, and we’ve made it simple to set up. All you have to do is assign the new, reserved, built-in kibana_dashboard_only_user role, along with the appropriate data access roles, to your user and they will be in dashboard only mode when they log in to Kibana. [Continue reading: ] Cluster Alerts in Monitoring was added in the 5.4 release, but until now the alerts only appeared on the Overview page of the Monitoring app. This new feature allows you to receive email notifications when the alerts are triggered. To use it, go to the Advanced Settings page in Kibana Management, enter an email address for `xpack:defaultAdminEmail`, and click Save: The built-in alerts will send an email to that address when they initially trigger, and when they're resolved. Using this feature does require that your Elasticsearch nodes are configured for the ability to send emails from watches. If you haven't set that up yet, take a look at the X-Pack documentation for Configuring Email Accounts: When we released the first phase of Cluster Alerts, we promised there will be more alert types to come, and we're delivering on that promise with the new X-Pack License Expiration alert. This alert will tell you when your X-Pack license is close to expiration. It starts as a low-priority alert when expiration is 30 days away, becomes a medium-priority alert when expiration is 15 days away, then becomes a high-priority alert when the expiration is 7 days away. In we introduce an Experimental Kibana Query Language it is disabled by default and can be enabled through the Kibana configuration. Kibana currently provides four different search mechanisms with overlapping responsibilities: Exposing the Lucene query syntax and the query DSL to users creates a few problems. Since we don't control the query syntax we can't implement features that would require introspection into a user's query. This includes things like: We could solve these problems by building a model in Kibana to represent raw Elasticsearch queries, but there are other advantages to building our own query language: So, we hope you'll turn on the Kibana Query Language and give it a spin and send us feedback! [Continue reading: ] When creating new visualizations, developers are no longer restricted to using just Angular as a rendering technology. The code now also enables developers to create custom editors that do not conform to the current sidebar-layout. Commonly used functionality - such as access to the query bar or time filter - is now also exposed on the visualization object. This avoids the need to import individual modules from inside Kibana. These changes are a first step in a longer term effort to provide a robust long-lived programming interface for building visualizations in Kibana. [Watch the webinar: ] Please , try it out, and let us know what you think on Twitter () or in our . You can report any problems on the .
Elastic Cloud Enterprise 1.1.0 released;;/blog/elastic-cloud-enterprise-1-1-0-released;Suyog Rao;November 14, 2017;Releases;; We are pleased to announce that ECE 1.1 is available to ! ECE 1.1 is packed with important enhancements and bug fixes, so please read on. You can find the release notes for ECE 1.1 .
Logstash 6.0.0 GA Released;Logstash 6.0.0 Released;/blog/logstash-6-0-0-released;Andrew Cholakian;November 14, 2017;Releases;; We’re glad to announce that Logstash 6.0.0 has launched! Today marks the first day of 6.0’s inter-planetary mission of making life easier for systems administrators and engineers. Can’t wait another minute to try it? Head over to our and give it a shot! However, you may want to take a minute to read about and first. Read on for what's new in Logstash 6.0!Streamline Processing with Multiple PipelinesOne pipeline no longers rules them all. Logstash 6.0 introduces the ability to run multiple pipelines concurrently for different use cases. The pipelines run together in the same instance, but with independent inputs, filters, and outputs to enable users to isolate processing logic per data source. This keeps your pipeline logic focused and concise. Historically, many Logstash users combine multiple use cases into a single pipeline which requires adding complex conditional logic. With multiple pipelines that’s no longer necessary: now, you can organize your config more cleanly, and execute your pipeline more efficiently by using dedicated pipelines per use case.To add to the fun, each pipeline also has its own independent settings and lifecycles which you can tune to match the respective workload profile desired for that data source. For instance, you may want to allocate more pipeline worker threads for a high volume logging pipeline, and throttle back resources for a lower intensity local metrics pipeline. For more details on multiple pipelines, take a peek at the and !Manage Pipelines Centrally with the Elastic StackIn the past, managing pipeline configurations was either a manual task, or you would use config management tools like Puppet or Chef to assist with operational automation. With Logstash 6.0, the centralized pipeline management feature now enables you to manage and automatically orchestrate your Logstash deployments directly with the Elastic Stack through the Kibana single pane of glass. This feature brings a Pipeline Management UI to Kibana, which you can use to create, edit, and delete pipelines. Underneath this UI we use Elasticsearch to store your pipeline configurations. With a few simple settings, your Logstash nodes can be configured to watch for changes on these pipelines, letting you seamlessly push out pipeline changes without additional operational infrastructure beyond the Elastic Stack.Centralized pipeline management is available as an X-Pack feature. To learn more, take a look at the and the !Visualize Pipeline Logic and PerformanceWe’re also proud to announce the long awaited in this 6.0 release, which is a welcome addition to the Logstash Monitoring UI. With this new tool you can both visualize your pipeline configuration and troubleshoot performance bottlenecks at the plugin level. The pipeline viewer displays your Logstash pipelines as a DAG (Directed Acyclic Graph). Overlaid on this DAG are relevant performance metrics for individual inputs, filters, and outputs. Potential performance problems are highlighted to let you quickly determine which portions of your pipeline may be bottlenecks.Users should note that this is a beta feature and may be subject to change. One known issue is the lack of ability to cleanly render very large pipelines. This is an area of active work, so expect significant improvements to this feature as we improve it further and move it into general availability.A Smooth Path from Ingest Node to LogstashSome users enjoy the convenience of Elasticsearch Ingest Nodes when getting started with the Elastic Stack. However, at a certain level of complexity, Ingest Nodes may not have all the features required to solve your problem and a migration to Logstash may be in order. To ease these transitions we’ve created an that ships with Logstash 6.0. The converter takes an Ingest Node pipeline as an input and spits out the respective Logstash pipeline. See the for more insight!Now With JRuby 9kWe’ve spent a lot of time moving the Logstash project over to the latest major version of JRuby: JRuby 9000, which has support for modern Ruby syntax and enhanced internals more amenable to optimization. For plugin developers, this means that your old plugins will continue to work in all but the most rare of cases. This upgrade also means that you can use new Ruby features going forward, as well as using Ruby libraries only compatible with JRuby 9k.
SHA-512 checksums for Elastic Stack artifacts;;/blog/sha512-checksums-for-elastic-stack-artifacts;Maxime Greau;November 13, 2017;Engineering;; Each time we do a release of the Elastic Stack, are generated. Until now, we provided SHA-1 checksum files alongside those artifacts to verify file integrity. The , announced this year, has increased the awareness to move to a safer alternative for all applications that rely on SHA-1 for digital signatures, file integrity, or file identification. That’s what we have done by using , . Elastic Stack 5.6.2+: SHA-1 and SHA-512 checksums While we encourage you to move to the SHA-512 checksum files quickly, we still generate the SHA-1 checksums for 5.6.x releases for backward compatibility. Elastic Stack 5.6.2 was the first version released with both SHA-1 and SHA-512 checksum files available. The same format is used to produce the contents of the and files. Once you have downloaded one Elastic Stack artifact, e.g. Kibana 5.6.3, and its related checksum files from , you can check the file integrity: Then you have to write a script to check that the checksum downloaded from is the same that the one generated locally, based on the binary artifact downloaded. Elastic Stack 6.0.0 checksums: new SHA-512 format, no more SHA-1 With the upcoming Elastic Stack 6.0.0 release, we have decided to: You can already validate the integrity of Elastic Stack files with the new SHA-512 file format with the last week: The new format, used by files, contains the hash value and the artifact’s filename associated on the same line (separated by two spaces): Now, with this format, it’s easier to check the integrity of a downloaded Elastic Stack artifact, by simply using the option: If the SHA-512 checksums matches, is printed along the filename, while a message will be printed to the standard output in case of error: Elasticsearch plugins The Elasticsearch team has updated that checks the integrity of the plugins files before installing it. All Elasticsearch official plugins have been updated with The expected behavior is based on the above explanations in this article, so: Conclusion To summarize, we encourage that you always check the integrity of the files you download. If you already have a script doing that with the Elastic Stack artifacts, please for all Elastic releases.
Monitoring the Dark Army with Kibana on Mr. Robot;Monitoring the Dark Army with Kibana on Mr. Robot;/blog/monitoring-the-dark-army-with-kibana-mr-robot;Renuka Hermon;November 10, 2017;Culture;; Elliot uses Kibana to visualize the Dark Army’s efforts to steal data in eps3.4_runtime-error.r00. Elastic users across the globe do a double take. We try to calmly write a blog post. Mr. Robot is a company-wide favorite for reasons that are likely obvious. The show is well-known for depicting cybersecurity scenarios with realistic detection strategy, tools, and response. So when the Mr. Robot team reached out to us, we were thrilled to hear that they wanted to feature Kibana in an upcoming episode. We, of course, said “yes!” We couldn’t wait to see our software in action in Elliot's world. By the way, you’ll notice that it’s not the latest version of Kibana, but to fit with the show's timeline. In true form, the technical minds at Mr. Robot wanted to build an authentic Kibana dashboard populated with data, created using the real tools. Can you see why we are fans? , Technical Consultant for Mr. Robot, indulges us all in the cybersecurity background behind each episode. In his latest post, he dives into how he made a Kibana dashboard that depicts malicious activity performed by the Dark Army. To start, Ryan built an ELK (today referred to as the Elastic Stack) VM comprised of Elasticsearch, Logstash, and Kibana. Then he populated it with data from Windows and Linux systems. For more behind-the-scenes commentary about how he built each section of the dashboard, head over to . We love hearing about all the ways users put the Elastic Stack to work. Seeing Elliot use Kibana to visualize the Dark Army’s attacks, well, it was pretty surreal. Here’s a taste of reactions from fans on our team: “Mr. Robot's writers and consultants have created a refreshing, technically honest show. They don't skimp on the detail, but it's all approachable. It rings familiar. I am humbled they chose to use the Elastic Stack.” - Nick Waringa, Senior Security Analyst “Our whole family set a Mr. Robot weekend-binge-watch record in preparation for the Kibana episode.” - Beth McAnerney, Netsuite Administrator “The last time I was this excited to be a nerd was when I read Snow Crash!” - Mark Walkom, Solutions Architect / Developer Advocate “Finally I work somewhere cool enough to make it on Mr. Robot!” - Grant Murphy, Cloud Engineer
Apply for an Elastic{ON} Opportunity Grant Today!;;/blog/apply-for-an-elasticon-opportunity-grant-today;Anna Ossowski;November 10, 2017;News;; For the past few years, our developer relations team has been running an informal scholarship program of sorts to help folks from underrepresented groups in technology attend Elastic{ON}. Last year, this program brought ten individual scholarship attendees from five different organizations to the conference. Hearing their perspectives on Elastic{ON} and learning about the insights they took home when the conference wrapped inspired us to step up our scholarship game, and the was born. For more information, please .
Swiftype Joins Forces with Elastic;;/blog/swiftype-joins-forces-with-elastic;Shay Banon;November 09, 2017;News;; I am thrilled to announce that Swiftype is joining forces with us. Swiftype is the creator of a highly-regarded, popular SaaS-based Site Search product and a newly launched Enterprise Search product. Meet Matt and Quin in a coming up on November 29. Some of you may be wondering, why is a company focused on building SaaS search applications joining forces with a search technology company? As I’ve said, I’ve always viewed ‘search’ as a wonderful foundation to solve many different use cases, whether it is search embedded in an application: search used for logging, security, or metrics: or search being used to create a whole set of new applications and products. Well, Swiftype did this. They built an entire company focused on making it simple for users to put a search box on their websites or within their applications and an enterprise search product for organizations to manage disparate content from various web applications. Swiftype’s first product is . If you go to websites and help centers for companies like Asana, Shopify, SurveyMonkey, and TechCrunch, that’s the Swiftype Site Search product powering the search box experience. And under the hood of Site Search, is Elasticsearch. In fact, Swiftype has been using Elasticsearch for a long time, since Elasticsearch version 0.90 for indexing and storing searchable content. Like so many other successful SaaS companies do, Swiftype created an amazing user interface and a lot of infrastructure around Elasticsearch to provide an incredible SaaS-first Site Search experience. I’m excited to announce that Site Search will be offered with a new introductory subscription plan starting at $79/month (). This will allow customers to grow at their own pace. In addition, Swiftype's Site Search also provides an ideal migration path for customers. Earlier this year, Swiftype released its second product, Swiftype . This product has web crawlers and out-of-the-box connectors to cloud applications like Atlassian, Box, Dropbox, Github, Google Apps, Microsoft Outlook, Salesforce, Slack, Zendesk, and an API to build custom connectors. With the EOL of traditional enterprise search products like , Swiftype Enterprise Search meets the needs of today’s modern organization using many cloud-based shared and private content repositories. Effective immediately, Swiftype’s Enterprise Search product will be available as a beta via trial request. As we move towards making it GA, our combined engineering teams will integrate more capabilities of the Elastic Stack and X-Pack into this product, as well as make this product available as both a SaaS service and on-premise solution. I’d like to welcome the entire Swiftype team, customers and community to the Elastic family. It’s really exciting that Swiftype’s founders -- Matt Riley and Quin Hoxie -- and the Swiftype team are joining us to further extend our vision to offer tailored solutions on top of the Elastic Stack. Now some words from Swiftype’s founders. Swiftype set out to build a cloud-based search platform that dramatically simplified the process of creating powerful, high-quality search experiences. With the ever-growing amount of content published on the web, and with consumers expecting intuitive search tools, the need for world-class search capabilities is greater than ever before for businesses of all sizes. Our ongoing goal has been to stay ahead of this need by delivering incredible search for any team, technically savvy or not. When we began designing our own infrastructure, we made an early bet on Elasticsearch as a foundational technology in our system — and it turned out to be a good one. Elasticsearch not only powers our primary search functionality, but has also grown to support a wide variety of other product and operational use cases. Suffice to say, we have been power users of Elasticsearch for almost as long as is possible, and we’re thrilled to now be part of this incredible team spearheading the next wave of innovation in search. It quickly became obvious when meeting the team at Elastic that it was a special company, and as we learned more about Shay’s vision for the future, we were confident that we had found an amazing partner in the pursuit of our mission. As part of Elastic we could not be more excited to continue innovating with the Swiftype product suite and delivering an amazing service to our customers around the world.
Elasticsearch Preview: Countermeasures against filling up disks;;/blog/elasticsearch-6.0-counter-measures-against-filling-up-disks;Alexander Reelsen;November 08, 2017;Engineering;; More checks, less problemsThis post will inform you about upcoming changes in Elasticsearch 6.0 with regards to the disk allocation decider, as there is one big change coming up you should be aware of. In addition we will quickly talk about some improvements in our logging infra as well, as this affects disk space usage.The new disk threshold decider behaviourIn a previous post we already about disk space savings, just by upgrading. That's an awesome thing, but of course only one part of the equation. You can still run out of disk space. There are dozens of reasons this can happen. For instance your monitoring is broken, you are receiving an insane data spike (maybe due to a DDoS attack), huge merges are going on, or one of your nodes goes offline and relocation happens.Elasticsearch has a list of allocation deciders, which check if a shard should be allocated on a node. For example these deciders make sure that no primary and replica shard are on the same node. Allocation deciders also take the shard allocation filtering rules into account or the total shard limit per node. Each of those deciders basically returns a decision telling the caller, if it is ok to put a shard on this node or not.One of those allocation deciders is called the which checks if there is enough space in order to allocate a shard. This decider allows you to configure a low and a high watermark. The low watermark is used to decide if a shard should be allocated on this node based on the remaining disk space. The high watermark is used to move away shards, once a certain amount of the disk is used. This allows the remaining shards to have some more breathing room.So far, so good. One precondition of this decider is that it is able to properly read the available disk space. Most of the time this is a not an issue, but you still might want to check. The easiest way to find out is to use the and check the information. You could do this viaGET _nodes/stats/fs?human # another way of checking GET /_cluster/allocation/explain?include_disk_info=true Ok, so this is good, right? We get close to running out of disk space, we move the shard away, everything is awesome. Yeah, no. Only sometimes.What if there is no space on other nodes to move the shard or there is only one node? Then we cannot move it, and at some point we will run out of disk space, risking data corruption.Wouldn't it be great to just stop indexing in case we risk running out of disk space? Yes, it would. That's why we did it from Elasticsearch 6.0 onwards.A new watermark has been added to the . If that watermark is passed (by default it is set to ), indices are marked as read-only. Which indices will be affected? Every index, which is writeable and contains a shard on this affected node. In addition, the indices are not fully read-only, deletes are still allowed, as they just need to update a small tombstone file.One important tidbit needs be taken care of by the cluster administrator. Once the indices are switched into this read-only mode, you have to manually mark them as writeable again. There is no automatic mechanism to switch back to writeable, once enough space has been reclaimed.In order to re-enable an index for writing again and to remove that setting, executePUT my_index/_settings { index.blocks.read_only_allow_delete : null } Wait, there's more!So, this is a great protection against running out of disk space. But this is not the only way to run out of disk space. Elasticsearch produces logs which get written into dedicated log files. If you have a rogue query that hits you several hundred times per second, this query might generate lots of log entries. This is one of the reasons you want to have your data and your logs on different partitions or even on different disks, so that logging I/O does not affect your query/index I/O.With Elasticsearch 6.0 we will do a couple of things (some are new, some have been there before) with regards to logfiles it creates If you want to customize this behaviour, you can always change the file in the directory.References: Hopefully you won’t need to face this protection, but if you want to play around with it, we are thankful for any feedback. Participate in the Pioneer Program by finding and filing new bugs, and be eligible for Elastic swag.
Sense Chrome plugin malware issue;;/blog/sense-chrome-plugin-malware-issue;Josh Bressers;November 08, 2017;Engineering;; Elastic has recently been made aware that the Chrome webstore has marked the Sense browser plugin as malware. The plugin in question is not published by, or affiliated with, Elastic. A long time ago Elastic wrote a plugin called Sense. Sense was the first version of what we now call the . The idea is that when curl is just too complicated you can interact with JSON using this tool. It gave developers the ability to easily write, debug, and modify the JSON being sent to Elasticsearch. The project was rather useful and used by many. As many successful open source projects go, Sense evolved and became part of something bigger than itself. Sense was added to Kibana version 4 as a plugin. In Kibana version 5 we renamed Sense to Console and include it with every copy of Kibana installed. It proved to be such a useful tool we wanted everyone to have easy access to it. When we decided to stop supporting the initial version of Sense the project was forked. In fact the project is still on . Anyone is welcome to fork the code and work on a project of their own. This is how open source works, the ability to fork a project or maintain your own version is incredible. Sometimes though things don’t always work out the way we’d like them to. We recently that the Google Chrome webstore has flagged a forked version of the Sense plugin as malware. We have a copy of this plugin, we looked at the contents and scanned it using VirusTotal, nothing obviously wrong stands out. That however doesn’t always mean it’s “safe”. It just means VirusTotal didn’t find anything wrong with it. Google has a pretty good track record about things like this, it’s likely there was something wrong with that plugin, it’s probably not a virus. Sometimes they flag things that are using extremely old and insecure libraries or even plugins that are doing something suspicious. Regardless, if you were using this plugin, you’d be wise to scan your system for possible problems. Chrome will automatically remove plugins from a running system that it believes contain malware. Even if you have a copy of this plugin and try to install it, Chrome will remove it eventually. If you were using the Sense Chrome plugin we encourage you to use the Console feature in Kibana. It has similar functionality and is part of a well maintained and actively developed project. There is a lesson here for everyone about software pedigree. Before installing things you find on the Internet, even through the Chrome store, you should note where it came from. Elastic was not the publisher of this particular plugin. There is a lot of dodgy software out there, some of it’s bad on purpose, most is accidentally bad. Elastic takes issues like this very seriously, we have teams of people who help us watch for problems like this and prevent them from happening in our products and services. There is a saying “software ages like milk, not like wine”. Old software can also be risky software.
Free Lunch for Open Source Engineers;;/blog/free-lunch-for-open-source-engineers;David Pilato;November 08, 2017;Culture;; When I started at elastic , we were a few people in the company and I was feeling pretty much alone in France. I’d been hired to write some code, help people on the forum (now ) and also continue evangelism efforts in France. That has always been a good balance for me. I mean that staying in front of my computer all day long is not the ideal thing. I need to talk with real people in real life and not only over Zoom or Google Hangout. I spoke one day with about that need and he told me that he would like to start doing BBL sessions. I asked: “?” Brown Bag Lunches aka BBL The brown bag lunch is the typical American bag you take out from a restaurant when you want to eat at your office. The idea was quite simple: Some people also call them “Lunch and Learn” sessions. I found that idea brilliant as it would exactly fill my needs in term of: So I started a day after with a Tweet: — David Pilato (@dadoonet) And answered me that Deal! The 1st BBL at SocGen So I gave . We were something like 15-20 attendees. It was a similar session of the one I gave at . Feedback has been very positive and I found out 4 key points: What about a website? Some other people started as well to run BBL in France. At some point , , and myself found that we should have a website to reference all the speakers/sessions so companies would be able to contact us. started that way. It’s super easy to add a talk… Well, it’s super easy as soon as you are a developer because you need to on GitHub a JSON document. :) The website helped a lot with getting more visibility. I think I’m getting 15 to 20% of my contacts through it. We started to a bit as the number of speakers/talks have been growing dramatically. Sadly, I did not find time to finish the job yet but thanks to and we have a running on CleverCloud and anytime a PR is merged the NodeJS hook fetch the data, transform it and upload that inside . The goal is to add a cool search engine on top of brownbaglunch.fr capable to deal with typos, find speakers around you using geolocation features, filter by label, using faceted navigation, giving autocompletion… 5 years later… I spoke with some of my colleagues about this and they started to organize BBL sessions in their respective countries. So I started to internationalize the website as well. We might want people to simply fork the original concept or host a global one for the entire world. We’ll see where it goes. Speaking at BBLs represents half of my evangelist activities. In 2017, I did it 17 times in Paris, Amiens, Lille… I went to companies like , , , , … Most of the time, I have around 15-20 attendees. Sometimes, companies don’t have enough seats for everyone, so I’m doing 2 separate sessions. Sometimes, we have people attending the session remotely over Zoom or Google Hangout. Sometimes we do instead of a BBL… Sometimes I’m getting super emotional as it can touch my heart really deeply. I had the opportunity to speak at in 2014. For those who don’t know Meetic, it’s a dating web/mobile application available in many countries. It’s Tinder-like, but Tinder was created years after Meetic. I was super excited to speak there for a very personal reason. In 2004, after a divorce, I met my wife on Meetic and in 2007 we got a baby, Max. I’d say that thanks to this website (in 2004, there was no mobile application!), my life changed totally. You can find a picture of Max searching for logs here! One of the BBL attendees was working at Meetic in 2004 so I was super thankful and it was a great pleasure to share my knowledge with the team. I heard later that , which is even better! Seeds, Harvest Speaking at BBLs is a great opportunity to share your knowledge, your experience with a community and to make growing happen even faster. At BBLs, I find a lot of people who are building a POC or running a project in production already. And they are happy to share their story with the community at meetups, even better, host the meetups and write blogs. In term of evangelism activity, we have seen that the number of downloads we are getting from France is super high. This means a lot to me. All those attendees I spoke to are starting to test the open source Elastic Stack. I’m calling them seeds. And as they get more successful in a few days or weeks, they surely move to production. In term of business, the presales/sales team then only has to explain how the commercial features in add value on top of the Elastic Stack. These days, features like , , and are getting a lot of traction. In a sense, the sales team has just to “harvest” all those seeds. As a farmer, you just need to be patient as it can take months if not years before we get commercially engaged but at the end it eventually happens. Get involved! It’s now your turn! If you want to have an Elastic engineer speaking within your company, just and we will find someone for sure in your area. As a distributed company in more than 25 countries, it’s easy to find someone in your community. If you have an interest and want to share your knowledge, share your own open source project, or build a community, just , send a pull request, and you’re done! Bon appétit !
Kibana 5.6.4 released;;/blog/kibana-5-6-4-released;Court Ewing;November 07, 2017;Releases;; We’re pleased to announce the release of Kibana 5.6.4, which includes some valuable stability improvements. One of the bugs that was fixed in Kibana 5.6.4 caused the browser window to crash when you used shift+return in console. We also fixed an issue where saved object import would fail under certain circumstances as a result of saved visualizations being imported before their associated saved searches were successfully imported. You can get Kibana 5.6.4 on our page and on . As usual, you can review all changes in the .
Three Ways to Get More Out of AWS re:Invent for Elasticsearch Users;Three Ways to Get More Out of AWS re:Invent for Elasticsearch Users;/blog/three-ways-to-get-more-out-of-aws-reinvent-for-elasticsearch-users;Luisa Antonio;November 07, 2017;Culture;; AWS re:Invent has gotten so big, they don’t just have an expo floor plan, they have a and it’s all taking place in a deceivingly short 4.5 days. There’s so much to get out of re:Invent, make it a two-for-one when you stop by the Elastic booth for personalized hands-on-demos and unlimited Q&A time with Elasticsearch technical experts, plus receive your own pair of Kibana socks. Find us at booth #2031 in the Sands Expo Hall located in The Venetian. We are towards the back right of the expo hall, just across from Cloudreach. If you passed the Salesforce booth, you went too far. Elastic folks will be available at this booth during all expo hall hours. for expo hall hours. This year will mark the third year Elastic will be at AWS re:Invent and we’re excited to make this event even more valuable for our Elasticsearch users. Seeing is believing and we’ve got some dynamic demos in store. and sign up for one of the sessions listed below or request one that works for your schedule.Elastic Cloud Enterprise (ECE) Demo: You know, the software that makes Elastic Cloud (our hosted and managed Elasticsearch product). Take a behind-the-scenes look at the architecture and see firsthand how to upgrade in a few clicks and monitor your deployments from a single console.Demo TimesSecurity Analytics with Machine Learning Demo: Watch the Elastic Stack in action. Machine learning, search, and analytics join forces to track a real-world data exfiltration attack.Demo TimesHave questions after attending your re:Invent Elasticsearch sessions? We’ll have Elastic Stack experts on hand to take your toughest search, logging, analytics, and metrics questions and talk through product developments coming down the pipeline. Ask us about the difference between Amazon’s Elasticsearch Service and , how many shards you should have in your cluster, or anything that’s on your mind. You can literally ask us anything! Not sure what to ask? What’s better than using Elasticsearch? Why, sporting Elastic socks of course!You know what’s even better? These socks will visualize your data on the go. Well, not really, but we’re working on it. . Of course we’ll have stickers galore and other swag for folks who drop by. If you’ve downloaded the , it’s also an opportunity to earn some points!What are you waiting for? Don’t waste time thinking about. Screenshot these deets before heading to AWS re:Invent and come say “hi!”AWS re:inventNovember 27 - December 1. 2017Hall B | Booth #2031Sands Expo | The Venetian201 Sands AveLas Vegas, NV 89169Not attending AWS re:Invent? Come visit us at our User Conference, Elastic{ON}, February 27 - March 1 in San Francisco. Early bird prices available for a limited time!
Deploying Elasticsearch on Microsoft Azure;Deploying Elasticsearch on Microsoft Azure;/blog/deploying-elasticsearch-on-microsoft-azure;Christoph Wurm;November 07, 2017;Engineering;ja-jp; I recently had the chance to present at Azure OpenDev, giving an overview of what running and using Elasticsearch and the Elastic Stack looks like today. What I'd like to do today is spend a few minutes going into more detail on some of the topics. But first, here's the recording: See the for the other talks by Microsoft, GitHub, CloudBees / Jenkins, Chef, and HashiCorp. Azure MarketplaceAs demonstrated in the talk, the easiest way to get started with Elastic on Azure is to use the . You can deploy it directly from the Azure Portal and it's going to handle all of the steps required to get Elasticsearch and Kibana up and running: Provisioning instances and storage, deploying and configuring the software, setting up networking and finally bringing everything up. We have a about how to use it. Now, there is another way of using it besides running it from the Marketplace. Since it is open source (the sources are ) you can also choose to deploy it from the command line. This allows you to automate things, and to make any customizations to its workings that you might want to do. We have about that. If you're using the template, please drop us some feedback at . HardwareSome of the most often asked questions when deploying on Azure are: Which instances should I use? We found the to be a good fit. Like every other data store Elasticsearch is very dependent on the amount of memory available to itself (as the JVM heap) and to the underlying host system (used for the important filesystem cache). You can read a bit more about how memory should be assigned in . We also recommend using . Backed by Solid State Drives (SSD) it allows Elasticsearch to reach its stored data quickly - and users will benefit from improved response times. also come with encryption at rest (via ). For bigger clusters, we recommend having three dedicated master nodes. They will not be storing any data, but will handle cluster management tasks like creating new indices and rebalancing shards. Small D series instances are most often good enough. Same as master nodes, Kibana has relatively light resource requirements. Most computations are pushed down to Elasticsearch, so you can usually run Kibana on small D series instances as well. Since it typically does a lot of processing it is best deployed on . AvailabilityOftentimes you will want to deploy a highly available Elasticsearch cluster that stays online even in the face of instance or zone failure. Azure has several concepts that help you design redundancy into your deployment and is a good read. Azure has geographical regions around the world. Each region then contains multiple data centers very close together. You will most likely want to choose whichever region is closest to you, or closest to the users of the system. All nodes of an Elasticsearch cluster should be deployed in the same region. Each tier of the Elastic Stack should be in . Two instances of Kibana should be in one, two instances of Logstash in another, and the Elasticsearch nodes in a third set. Azure distributes instances across . During a planned maintenance event, only one update domain is going to be rebooted at a time, while only the machines in the same fault domain are sharing a power source and a network switch. Distributing your instances across domains ensures the availability of instances in expected and unexpected circumstances. Azure is previewing the concept and is supporting . Going forward, this is likely going to be the best way to deploy Elasticsearch. Each zone should have one master-eligible node (or a dedicated master node) and data nodes should be distributed across zones and tagged appropriately using . When using dedicated Elasticsearch master nodes (see above), using is a good way to scale the Elasticsearch data nodes up and down as needed. BackupElasticsearch has a to ship index files to a remote backup location. An official is available, and it . Collecting Data with Beats and Logstash BeatsWe've for enriching events collected by the Beats with Azure metadata (instance_id, instance_name, machine_type, region). So no matter whether you use to tail log files, for system metrics, the new for Linux audit logs, or any other of the many official and community beats - you will always know on which machine an event originated. LogstashIn contrast to the Beats which collect data from the source, Logstash is commonly used to receive data from the Beats for further processing - or to pull data from intermediary systems. There are a number of third party input plugins available specifically for Azure: Reads data . Given a storage account name, access key, and container name, it will . Reads messages . SummaryDeploying Elasticsearch and the Elastic Stack on Azure is a great idea, and hopefully this post gives you many pointers on how to do it. Let us know how it goes!
Brewing in Beats: New Dashboards for Auditbeat;;/blog/brewing-in-beats-new-dashboards-for-auditbeat;Monica Sarbu;November 06, 2017;Brewing in Beats;; Welcome to ! With this weekly series, we're keeping you up to date with what's new in Beats, including the latest commits and releases. New Auditbeat dashboardsWith this , Auditbeat gets new configuration samples in the and three new dashboards: These new dashboards will be present in Auditbeat 6.1 Windows services metricsetThanks to our regular contributor , Metricbeat gets a metricset in the module that collects information about which services are running and data about each of them. Fields are things like “name”, “display_name”, “uptime”, “state”, “start_type”. This new metricset is scheduled to be released in Metricbeat 6.1. Other changesRepository: elastic/beatsAffecting all BeatsChanges in master: Changes in 6.0: PacketbeatChanges in 6.0: FilebeatChanges in master: PackagingChanges in master: Changes in 6.0: DocumentationChanges in master: Repository: elastic/kibanaTime series visualizationsChanges in master:
Brewing in Beats: Kubernetes deployment files;;/blog/brewing-in-beats-kubernetes-deployment-files;Monica Sarbu;November 02, 2017;Brewing in Beats;; Welcome to ! With this weekly series, we're keeping you up to date with what's new in Beats, including the latest commits and releases. Kubernetes deployment filesWe are making it easier to deploy Filebeat and Metricbeat 6.0 on Kubernetes by providing deployment manifest files. You can find more details about deploying Filebeat and Metricbeat on Kubernetes in the , but it can be summarized as: curl -L -O https://raw.githubusercontent.com/elastic/beats/6.0/deploy/kubernetes/filebeat-kubernetes.yaml # edit the YAML file to set the Elasticsearch connection information kubectl create -f filebeat-kubernetes.yaml The above commands install Filebeat as a DaemonSet, ensuring one agent is running on each Kubernetes node, and configure it to pick up the logs from , unwrap them from the JSON objects, automatically enhance them with Kubernetes metadata (pod names, labels, etc.), and ship them to Elasticsearch. Metricbeat follows a similar . In progress: Filebeat modules for Elasticsearch and LogstashWe have work in progress PRs for adding Filebeat modules for and . Kibana is coming next. This is part of making the Elastic Stack easier to monitor with the Elastic Stack. Both modules are planned to be released with 6.1. Thanks HacktoberfestWe are pleasantly surprised to how many of you participated to Beats as part of Hacktoberfest this year, and we would like to thank the following contributors: We are looking forward for the next Hacktoberfest. Other changesRepository: elastic/beatsAffecting all BeatsChanges in master: Changes in 5.6: MetricbeatChanges in master: Changes in 6.0: PacketbeatChanges in master: Changes in 5.6: FilebeatChanges in master: Changes in 6.0: TestingChanges in master: Changes in 6.0: PackagingChanges in master: Changes in 6.0: DocumentationChanges in master: Changes in 6.0:
Using the Elastic Stack and Graph to Tackle Toxic Content;;/blog/using-the-elastic-stack-and-graph-to-tackle-toxic-content;Mark Harwood;November 01, 2017;Engineering;; This is a challenging time for media organizations that distribute content they don't author themselves. Increasingly, sites that serve social media, music, or video are paying closer attention to the messages they're delivering. While free speech is a widely cherished principle, in practice, most commercial organizations exercise some form of content review outlined in their own terms of service. These restrictions on content aren't motivated by an organisation's desire to act as some form of moral arbiter - often the pressures are external. While search engines have historically helped surface content that's desirable, they can equally be applied to the challenge of identifying content that's undesirable, as we'll see in this post. Volume of content and the complexity of making judgement calls, however, can make this task more challenging. (Not to mention the issue of balancing false-positives and false-negatives.) There are a couple of approaches (which are not mutually exclusive) to do so: We'll explore both scenarios using real data and the Elastic Stack. Proactively identifying content The good news is you don't necessarily need to adopt complex analysis of your text, audio, or video content to identify candidates for review. We can use the same people who liked X also like Y… techniques employed by recommendation algorithms. The difference is we are focusing on users with undesirable tastes rather than desirable ones. Let's start by looking at some real data (that we've anonymised for this post) in the form of user profiles from a music streaming service where each user profile contains their list of favourite artists. If we start with a single artist name (we'll refer to them here as fictitious Band X) that's known to be associated with hate speech we can query the user profiles to see which other artists are favoured by these Band X fans. We can walk these connections simply by hitting the + button in X-Pack's Graph UI. First, a word on meaningful relationships Before we look at any connections, it's important to appreciate that the Graph API that underpins the UI uses some special relevance-ranking logic from our search engine heritage that is not found in typical graph databases. Let's turn this special feature off in the settings to see what problems these significance algorithms help avoid: With this feature turned off our top suggestions for the interests of Band X fans look like this: Our resulting graph shows a link between Band X and The Ramones, a popular group not known to be associated with hate speech. When we click on the link between Band X and the ramones suggestion, we can drill down into the stats of how many users like Band X and The Ramones using a Venn diagram. There are 97 Band X fans who like The Ramones, and while that may be a large number it is not significant — The Ramones are generally popular (just like The Beatles or ) and a huge majority of their fans have no interest in hate speech. The Ramones are not relevant to this content exploration -- they are off-topic -- and should not appear in the top recommendations. It should be obvious from this example that popularity is not the same thing as significance. Let's turn the default significance setting back on: Now when we walk the top connections let's see what significant suggestions we find: The Ramones are no longer present and instead we find Band Q. Only 41 of the Band X fans like Band Q — but there are only 53 Band Q fans in total. That's a huge overlap and suggests that the two bands are strongly associated even though they are less popular than The Ramones. Band Q is an anonymised name but you only need to see the real band names from this dataset to know that the significance algorithms are staying on topic while following the connections. This is the first hop in what could be a multi-step expansion. We started with a single known-bad, not knowing too much about hate content, and discovered some very strong leads. When we hit the + button in the UI again we are now asking a much better question: find people who like bands X OR Q OR Y OR Z and widen the net to identify more sources of hate-related content. Repeat as desired. This same iterative operation is possible using custom client code, multiple queries, and use of the significant_terms aggregation. As demonstrated in these examples, this is very simple using the Graph UI in X-Pack. Dealing with outputs Using these techniques on real data, it is easy to uncover large amounts of questionable content that will need careful human review before blacklisting. That said, simple blacklisting is unlikely to be a scalable business solution due to sheer content volume, the difficulty of reviewing each case fairly, and dealing with appeals. The highest-scoring outputs of the algorithms can be manually reviewed and then blacklisted, but this may just be the tip of an iceberg. A useful coping mechanism might be to adopt a greylist policy for the large numbers of lower-scoring matches that staff don't have time to review. These greylist items could only be discoverable by those users searching specifically for that content but are not promoted or recommended in any people also like… type user recommendations, nor are they featured alongside advertisers' messages who may not value the brand association and . Reactive content removal Media organisations can also choose to rely on their site users to report undesirable content using report this type buttons. Sadly, like the content on a site, takedown requests may not be as innocent as the site owner would wish. Groups of like-thinking users or bots can coordinate their requests to try to remove content which they find objectionable, but is not actually in violation of a site's terms of service. Site owners therefore need to review takedown requests and identify coordinated attempts to censor content. This activity is similar to analyzing review fraud in a marketplace where shill bots may be employed to artificially boost the reputation of a seller with positive feedback. However, in this use case, bad actors are trying to flood the review subject with negative feedback. The means of detecting artificial feedback is the same, though: coordinating actors typically share something in common that normal independent actors don't. This might be: One-off sharing coincidences between independent actors are to be expected but high numbers of these coincidences between user accounts would indicate collusion. We can use the X-Pack Graph API to identify the coordinating actors and generate alerts which can be reviewed using the Graph UI. We can index takedown requests using a combination of these terms: Using the Graph API and selections of the above terms as vertices in the graph, it is possible to walk the connections for each ContentId to pull together a summary of the users objecting to that content. Below is an example of review collusion: The red vertices are reviewers of a service and the blue vertices are a token that describes some aspect of their reviews (in this case I chose date+hour of review but I could also use IP address, MD5s of review text or join date). The lines show which reviewers are associated with which review aspects. The cluster on the right shows a healthy, normal reviewer — his blue vertices are not shared by many other reviewers. The cluster on the left however shows an extraordinarily high number of coincidences — many reviewers just happening to synchronise their reviewing behaviours on repeated occasions over many months. This is likely a single actor using multiple fake accounts to game the system, and these accounts are candidates for being blocked. Equally, a network showing a healthy gathering of many mature reviewers with no evidence of collusion could score the subject content as a high-confidence takedown judgement and be prioritised for human review or automatically blacklisted given a suitable score threshold. A detailed example of setting up a review monitoring platform is here: Summary Companies can likely expect growing pressure to enforce their terms of service, and audience sizes and advertising revenues will depend on their ability to do so. We need tools that help staff quickly separate signal from a sea of noise to successfully face the hard challenge of reviewing and managing content. In these examples, we've seen reactive and proactive solutions to dealing with undesirable content using features from X-Pack. Try it for yourself today and get to grips with your own content:
Elasticsearch 6.0.0-rc2 released;Elasticsearch 6.0.0-rc2 released;/blog/elasticsearch-6-0-0-rc2-released;Clinton Gormley;October 31, 2017;Releases;; We are excited to announce the release of , based on . This is the sixth in a series of pre-6.0.0 releases designed to let you test out your application with the features and changes coming in 6.0.0, and to give us feedback about any problems that you encounter. Open a bug report today and become an . This is a pre-release and is intended for testing purposes only. Indices created in this version . Upgrading 6.0.0-rc2 to any other version is not supported. Also see:
Logstash Lines: Experimenting With Bytecode Generation;;/blog/logstash-lines-2017-10-31;Andrew Cholakian;October 31, 2017;The Logstash Lines;; Welcome back to The Logstash Lines! In these weekly posts, we'll share the latest happenings in the world of Logstash and its ecosystem. I'm really excited to announce a new patch that just landed in master and will most likely be released in Logstash 6.1 native java execution with runtime . This is all thanks to the amazing work of , who's leveraged his significant expertise to deliver us this patch. You can test it out by using the special flag on the master branch of Logstash. Logstash will have two ways of executing in releases with this patch: Our goal is to continue to improve the feature flagged code generation and get feedback from users willing to try it out as an experimental feature, then make it the default execution sometime in 6.x once it's been tested more in the field. If you're interested in giving it a shot it's probably best to wait till it's out of master and in a proper release behind the aforementioned feature flag as this feature is still under active development.
Elastic Stack 6.0.0-rc2 released;;/blog/elastic-stack-6-0-0-rc2-released;Tyler Hannan;October 31, 2017;Releases;; If you’ve been following along closely you will be aware that we’ve been busy in preparation for 6.0.0 GA. It is our intent, and hope, that rc2 represents as close to final code as is possible before the GA release. This also means that – if you want to test in advance of GA – this could be close to your final chance. While releasing all the things on the same day is no trick (but quite the feat of coordination), we do hope you find it a treat to play with upcoming features. We continue to work hard squashing bugs and adding features to each product because, like mummies, we are afraid to unwind. Before you get too excited, keep in mind that this is still a release candidate so don’t put it into production. There is no guarantee that any of the 6.0.0 pre-release versions will be compatible with other pre-releases, or the 6.0.0 GA. During the 5.0 release, we introduced the Elastic Pioneer Program and are continuing the with the 6.0 preview releases. Keep in mind that you can . That’s correct, release candidates are now available on your favourite hosted Elasticsearch provider. All of the products have multiple changes of varying complexity. We’d encourage you to test with the entire Elastic Stack. Elasticsearch Get It Now!
Thank You for Speaking: Motivation from Elastic{ON} Tour Toronto;;/blog/thank-you-for-speaking-elasticon-tour-toronto;Renuka Hermon;October 30, 2017;Culture;; Dog parks, hockey, craft coffee . . . it’s like the city of Toronto has an insider’s view of our Slack channels at Elastic. My first Elastic{ON} event was bound to be great.Strolling in at 7 a.m. on a boiling hot, late-September day in Toronto (weather anomaly anyone?), I saw the Sony Centre transformed into a vibrant space surging with Elastic{ON} energy. I’ll come out and say it, our Events and Design teams don’t mess around. All eyes were drawn to the stage. Between chatting with folks at registration and fetching extra computer chargers for attendees who were rapidly note taking, I saw turn to the crowd, ready to kick off his presentation. As he started discussing how they use the Elastic Stack to report on and monitor usage on the public Wi-Fi offering that they’ve built in the Toronto and NYC subway systems, we all knew we were in for a treat. He’s one of us, an Elastic user with a great sense of humor. He started his talk in one of the best ways possible: laughter. Here we see his favorite (impossible?) captchas that prevented users from joining the Wi-Fi. Then he dove right into the heart of the story. He described his adoption of the Elastic Stack and how BAI Canada uses it to “.” Jeremy walked us through his journey specifically around creating more robust reporting capabilities and gave us a peek into what their future holds for machine learning, analytics, and modeling. Although happy hour was yet to begin, there was plenty of humor, raw storytelling, and humility: qualities of those who create. That’s why we (community members and employees alike) attend. We get to chat face-to-face with the people who love these tools as much as we do and use them to make lives better, simpler, and happier. Writing as someone who would much rather sit behind a keyboard than stand in front of a crowd, I discovered a few points that may be helpful if you (like me) are considering speaking at an event, , or our .If you're looking for motivation to speak, remember this:Elastic users are clever and apply the software in ways we hadn’t predicted. Your use case is interesting and I guarantee someone in the crowd will have a light bulb switch on thanks to you. Newcomers, veterans, those who tinker, and those who have helped build the Elastic Stack are all in the crowd. If you’re considering applying to speak at an Elastic event, your fan base is built in. Have you always been an open source fan? Did your boss give you pushback on utilizing new software? Did the office hold a party in your honor because you found a way to visualize your security logs in a useful way? People in the crowd will relate. Behind the scenes, I heard a few veteran speakers mention pre-show jitters and post-show adrenaline rush. Of course, preparation is key. Daniel Palay and Livia Decurtins, who head up Elastic{ON} External Speaker Relations, make sure that no speaker question goes unanswered and that each presentation slide is spotless. Who’s that person commanding the room with Elastic Cloud Team Lead, Suyog Rao? That’s Jeremy, but next time it could be you . . . or me. Elastic engineers are sharing roadmaps at the same event, oftentimes on the same stage. And I will leave you with this: we also have incredible at Elastic{ON} Tour and Conference. Don’t miss out. Thank you, Toronto, for being an incredible host.
Elastic{ON} 2018: Announcing the Fourth Official Elasticsearch User Conference;;/blog/elasticon-2018-announcing-the-fourth-official-elasticsearch-user-conference;Shay Banon;October 30, 2017;News;; I personally believe that the culture of a company is created not by any one decision, but by thousands of small decisions made by hundreds of individuals. The culture of Elastic{ON} is no different. It is not the product of where we hold the conference or what we put on the walls, but of the thousands of personal interactions, meaningful conversations, and individual lightbulb moments that happen when you bring committed users from an open source community into the same physical space.This community inspires us to build better products that end up living up to what you, our users, expect us to bring to the table. Whether it’s through simply using our products, being innovative around new use cases, contributing code and effort in documentation, or opening issues around something that just doesn’t work, you continue to be the most critical drivers of new products and ideas.And from 27 February to 1 March of next year, more than 400 of your biggest fans from Elastic will be anxiously waiting to meet and talk with you at . Why? Because there is no “average” Elastic{ON} attendee. Every one of you is doing something creative that we want to hear about, bumping into problems we want to try and solve, and coming up with use cases we haven’t even dreamed about. We care deeply about the user experience you have when you embrace and deploy our products. We try to imagine, what is the first download experience that a user has? What is it like, that moment when you go and click download into your laptop and you’re there to try to solve a problem? And are our products living up to solving that problem? We want to talk with you about it at Elastic{ON} 2018. This year, there are a few things that I, personally, am really excited about. It really is mind blowing to think about how much the conference has grown since 2015, but the culture and the heart of Elastic{ON} remains the same. At Elastic we continue to work each day to make simple things simple and difficult things possible. At Elastic{ON} we get to share what the future holds and see what kind of amazing and difficult things our users are doing and what new tools we need to build for them. .
Which Korean analyzer shall I use?;;/blog/using-korean-analyzers;Kiju Kim;October 26, 2017;Engineering;ko-kr; Hangul (Korean alphabet) was created in 1443 by King Sejong the Great. Before that, Korean people used Chinese characters but only Yangban, the ruling class people could actually learn and use it and the ordinary people could hardly use it because Chinese language is so different from Korean language and it was too difficult for the ordinary people to spare time to learn. Hangul is a phonetic alphabet and consists of 24 characters: 14 consonants (ㄱ[g], ㄴ[n], ㄷ[d], ㄹ[l/r], ㅁ[m], ㅂ[b], ㅅ[s], ㅇ[null/ng], ㅈ[j], ㅊ[ch], ㅋ[k], ㅍ[p], ㅌ[t], and ㅎ[h]) and 10 vowels (ㅏ[a], ㅑ[ja], ㅓ[ə], ㅕ[jə], ㅗ[o], ㅛ[jo], ㅜ[u], ㅠ[ju], ㅡ[ɯ], and ㅣ[i]). We can combine them and make 11172 characters (syllables), e.g. ㅎ+ㅏ+ㄴ=한. A Korean analyzer is required to search Korean documents effectively. Korean is an agglutinative language whereas English is an inflectional language and Chinese is an isolated language. A predicate changes its form according to its ending (e.g., ‘먹다’ and ‘먹고’), and a noun is usually followed by one or more postpositions (e.g., 엘라스틱서치(noun)+를(postposition)). If we query without a Korean analyzer, we can only get a single form of the predicates or nouns. For example, if we query ‘엘라스틱서치’, we don’t get the documents including ‘엘라스틱서치를’. A Korean analyzer analyzes ‘엘라스틱서치를 이용해서 한국어 문서들을 효과적으로 검색하려면 한국어 분석기가 필요합니다’ and extracts tokens such as ‘엘라스틱서치’, ‘를’, ‘이용’, ‘해서’, ‘한국어’, ‘문서’, ‘들’, ‘을’, ‘효과적’, ‘으로’, ‘검색’, ‘하려면’, ‘한국어’, ‘분석기’, ‘가’, ‘필요’, and ‘합니다’. With these tokens, we can query ‘엘라스틱서치’ and get the documents including either ‘엘라스틱서치’ or ‘엘라스틱서치를’. Currently Elasticsearch has commercial and open source analyzers and provides APIs to implement analyzers. Among them, seunjeon, arirang, and open-korean-text are the widely used open source Korean analyzers. Open-korean-text supports only Elasticsearch 5.x. I installed these three Korean analyzers on Elasticsearch 5.5.0 and measured time and memory consumption during analysis. seunjeon arirang open-korean-text To see the effect of the JIT compiler, I ran the same test twice when I measure the analysis time. I used ‘time’ (http://man7.org/linux/man-pages/man1/time.1.html) to measure time to analyze a Korean text (see at.sh, ot.sh, and st.sh in the appendix). I ran it once just after starting Elasticsearch, and then ran it again without restarting Elasticsearch.Fig. 1 Time to analyze a Korean textArirang is the fastest in the both runs, but the 2nd run of seunjeon is much faster than its 1st run. Open-korean-text is similar or a bit slower than seunjeon.Memory (Java heap) consumption was measured at the three points: just before the analysis, maximum usage during the analysis, and just after the analysis. I used ‘jstat -gc’ () to measure the memory consumption (see am.sh, om.sh, and sm.sh in the appendix). Again, I ran the same test twice. I ran it once just after starting Elasticsearch, and then ran it again without restarting Elasticsearch.Arirang showed almost no difference among them, whereas seunjeon showed big increase during the analysis. Open-korean-text showed moderate increase during the analysis but it is mostly released after analysis. Fig. 2 Memory consumption during Korean text analysis (1st run) Fig. 3 Memory consumption during Korean text analysis (2st run)Finally let’s see the analysis results. The original string is the preface King Sejong wrote when he invented Hangul. I used the same string for the time and memory consumption test. “나라의 말이 중국과 달라 한자와는 서로 맞지 아니할새 이런 까닭으로 어리석은 백성이 이르고자 하는 바 있어도 마침내 제 뜻을 능히 펴지 못할 사람이 많으니라 내 이를 위하여, 가엾게 여겨 새로 스물여덟 자를 만드나니 사람마다 하여금 쉽게 익혀 날마다 씀에 편안케 하고자 할 따름이니라” (translation: Korean language is different from Chinese language. It doesn’t fit in with Chinese characters. Many Korean people have difficulty expressing themselves with Chinese characters. To help them, I invented new 28 characters. I hope people learn and use them easily in daily life.) Table 1. Analysis resultsOpen-korean-text extracts the most tokens and differentiates adjective and verb but mis-analyzed ‘달라’, ‘새’, ‘마침내’, and ‘케’ as nouns. Seunjeon also mis-analyzed ‘내’ as verb. Arirang doesn’t provide the part of speech information.By the way, how much portion of the total indexing time does the Korean analyzers take? When I tested with seunjeon, the total indexing time was 29.036 seconds (1st run) and 20.952 seconds (2nd run). The Korean analyzers took 47.83% and 26.67% of the total indexing time respectively. Memory consumption increase was 802 MB during analysis and 853 MB during indexing, which means Korean analyzer takes much part of the total memory consumption during indexing. Analysis time affects indexing time more than search time because in search time, you’ll only analyze short keywords, not the already indexed long text. Based on this test result, no one Korean analyzer seems to be absolutely superior. Arirang can be a good candidate when speed and memory consumption is important. But you’ll have to use seunjeon or open-korean-text if you need the part of speech information. One thing to remember is that Korean analyzers take considerable amount of time and memory during indexing and the choice of Korean analyzers is very important in your Elasticsearch configuration.Appendix ()
Elastic{ON} Tour Stops and Their AMA Heroes;;/blog/elasticon-tour-stops-and-their-ama-heroes;Tony Sleva;October 26, 2017;Culture;; “The AMA is worth the price of admission, as far as I’m concerned.” — Michael Alexander, Senior Software Engineer After attending my first Elastic{ON} Tour event, the above quote may most accurately encapsulate my takeaway from the . For all that the event offered, including talks by Shay, multiple Elastic product managers, and special guests from X-Box, the highlight for many attendees was the Ask Me Anything booth staffed by authentic Elastic software engineers. I should stop for a moment. You likely already know about , but there’s a chance you haven’t heard of the . Elastic{ON} Tour stops are one-day events in cities around the world (50% of this year’s stops are outside of the U.S.) that offer a localized Elastic{ON} experience — content tailored to your region with users and engineers from your local community. Each tour stop is made up of a series of speaker sessions (roadmap, best practices, use case success stories, etc.) with breaks interspersed throughout to allow for mental digestion and professional networking. But while all that is going on, away from the presentations, closer to the food spreads, is a table stocked with Elastic engineers. Specifically, Elastic engineers that know a whole lot about the Elastic Stack (because they helped build it). Even more specifically, Elastic engineers that will answer any question you can think of because their only job for the day is to answer your questions about Elastic. Go ahead, ask them anything. I dare you. Not sure where to start? Here are just some of the many questions that could be overheard in Seattle: Those are all questions that would usually be directed towards our support team by customers with a commercial . But at Elastic{ON} (and its tour stops), our AMA engineers are available for your questions all event long — no subscription required. And the Seattle attendees, knowing that this would be the case, weren’t about to let an opportunity this good slip through their fingers. Here’s what some of them had to say of their AMA experience: “It was really helpful to be able to talk to an actual developer from Logstash. They were able to quickly fix a problem that had been going on for two to three weeks.” — Jonathan Li, Software Engineer “An initial index design, while functional, was not scaling well due to the mapping and high number of deletes. The AMA engineers really helped and gave me some great ideas on how to approach re-architecting.” Look at that. Real-time answers from real-life Elastic engineers. How you leave happy? I learned a lot standing around that table. The first thing I learned was that our AMA engineers fear no question, and they will talk Elastic for hours (days, maybe). Next, I learned that a lot of answers start with, “,” which is a pretty good indicator to take a seat, because you’re about to dive deep into a use case exploration. Finally, and most importantly, I learned that AMAs are an invaluable resource for Elastic{ON} (and Tour) attendees, are supremely appreciated by all who stop by, and in some cases, the driving impetus for attendance.
Time Series, Annotations, and Anomalies with Kibana;;/blog/time-series-annotations-and-anomalies-with-kibana;Alex Francoeur;October 25, 2017;Engineering;; It's been a long busy summer and we've taken longer than to close out our Time Series Visual Builder blog series. If you haven't had a chance to view the first two video-blogs or want a refresher, we highly recommend setting aside a few minutes to watch them both (, ). Today we'll be going over how to add annotations to a time series visualization from anomalies detected by a machine learning job. If you'd like to follow along, we'll be using the latest version of Kibana with machine learning feature installed and logs from our . For this demo specifically, we are using the Kibana 6.0.0-RC1 build. The latest preview release can be found . In this video, you'll learn to do the following: Ready to dive in? Watch and follow along in the video below. We're moving fast with the Time Series Visual Builder! If you have a feature you'd like to see, I invite you to open an issue in the and add a label.
PSD2: Monitoring Modern Banking API Architectures with the Elastic Stack, Part II;;/blog/psd2-architectures-with-the-elastic-stack-part-ii;Loek van Gool;October 24, 2017;Engineering;; At Elastic, we :heart: APIs because developers love to work with them to get things done. APIs also have the power to change (or disrupt) an industry quickly and decisively, as is the case with The Revised Payment Service Directive (PSD2). APIs make it possible to seemlessly switch from Web browsers to apps, to deploy content to any platform, and to find the best deals among thousands of suppliers. PSD2 sets out to standardize APIs between EU banks and abolish the existing lock-ins that still exist in the industry. Because while financial institutions are closer to the forefront of the innovation curve than almost any other industry, the point can be made that this has not resulted in wide-spread open access to the core banking ecosystems - namely accounts and transactions. PSD2 is a directive from the European Union that will make banks open up access to their, otherwise private, core banking functions in ways that we have not seen before. PSD2 legislation introduces a breadth of opportunity for retail banks, while also introducing new risk. The Elastic Stack plays a vital role in many of the world’s banks today, and that will especially be true for PSD2 architectures. This is Part II of a series on PSD2 in which we will focus on creating “observability” in a public API architecture, that is to say at all times knowing the status of the business service, its anomalies that require attention and all historical raw data around individual users and requests. focuses on using the Elastic Stack for running next-generation retail banking APIs and also gives a general introduction of PSD2 regulation and strategic options for EU retail banks. A Shopping ListAt Elastic we get to see many customers running production, value-add installations, the successful deployments provide the business with a platform to leverage for insight. The commonality that can be extracted from these installations include but are not limited to: The Elastic Stack for Logging and Metrics At the highest level, Elastic is functioning as the data platform for all logs, metrics, and traces that are generated in the Elastic data platform. A separate cluster will ensure separation of resources and data. Data agents generate and collect relevant data into a pipeline that transforms the data before ingesting it into a permanent data store. From ingestion, that data is immediately available for automated and manual analytics: machine learning, dashboarding, ad-hoc queries, and the likes. The Elastic Stack for Logging and Metrics More specifically, the logical architecture looks like pictured above. The Elastic Stack offers a complete suite of products for API observability architectures: The Elastic Stack logical architecture for Observability combines all these products into an end to end platform with accompanying services, like Consulting and Expert Support. As you have probably read a bunch of times by now, Elastic :heart: APIs. That is why the Elastic Stack products natively supports REST API endpoints for easy integration into any architecture. Keeping an Eye on Things, All Things are composed of documents in the 1st Normal Form (1NF), usually with a timestamp. 1NF is important to achieve linear scalability: it is not feasible to arbitrarily join multiple datasets of hundreds of terabytes while the user or a real-time process is waiting for the answer. Of course, it’s a good idea to join those datasets at time of ingestion! That still allows us to scale to billions of events per day without slowing down. Millions of similar events will stream into the Elastic platform using the Elastic Beats data agent towards Logstash, Elastic’s data processing product. Logstash will be able to enrich, lookup, filter and transform the data in transit before storing it in Elasticsearch. After Logstash, the same document might look like this. It has relevant information added to it that will help the observability of what is actually happening on our APIs. Bold fields added by Logstash. There is ample opportunity to add in any business logic. A simplified, enriched event log describing a single API call, in JSON format, after enrichment with GeoIP information and a threat score: [{ timestamp: 2018-01-05T18:25:43.512Z, http_method: GET, request: transactions/latest, result: 200, error: null, ip: 123.123.123.123, geoip_fields: { country_iso_code: NL, city_name: Rotterdam, location: { lat: 51.922755, lon: 4.479196 } // other fields omitted } user: Alice, user_last_login: 2018-01-01T16:40:09.938Z, threat_score: 0.042, authentication_method: app_fingerprint, ... // other fields omitted }] When we pre-filter, pre-aggregate or otherwise remove data before our data store, we will, by definition, lose an unknown amount of information. Elasticsearch will take billions of logs and metrics to provide you an unobstructed view of what is actually happening, in real-time. Kibana sits on top of the stack to discover data and manage Elastic components. This is where scalability becomes important. When we pre-filter, pre-aggregate or otherwise remove data before our data store, we will, by definition, lose observability. Luckily, the Elastic Stack can take on any workload, even if you turn out to be the largest retail bank of the globe. Point Solutions for Logs, Metrics, MetricsA number of point solutions for a subset of the desired functionality are available, often closed source and not seldom including a form of vendor lock-in. Apart from added complexity in buying, deploying and operating multiple systems where one can suffice, the real problem is the additional overhead of having to deal with multiple ‘truths’ at the same time. While attackers are rampaging through the system, or outages are hampering performance, your SecOps and DevOps might be manually correlating the “logging solution” output, with the “metrics solution” output, possibly demanding another tool to overlay on the said point solutions. And while some integration options are usually available, some of these do not expose their raw data willingly. Elastic clears these issues completely, by bringing together what should be together. Scaling Up Within BudgetSo, now that we have established the need to and keep it in a real-time data store for enough time to be able to train Machine Learning jobs, understand longer-term patterns of behavior, and investigate interesting events. How do we keep costs at bay? Elastic has support for several advanced strategies: A multi-tier data architecture looks like this: Monitoring a World-Class API ServiceAll these provide their own perspective on what is happening in the system. So it helps to keep them in the same place. This is where the agnostic nature of Elastic shines: it really does not limit the types of data that can be used on it. You can happily aggregate metrics into KPIs on dashboards, alongside frequent errors taken from log files, with stack traces on the same data store so that DevOps Engineers can dive into anything interesting in seconds. Kibana makes it possible for everyone to create the most relevant perspective on the data, and share those visualizations, dashboards, graphs and machine learning jobs with the organization. Or just keep it for themselves. The (Un)known (Un)knowns with X-Pack Machine LearningSelf-learning anomaly detection is all about tackling both and . We believe that even if nobody has predicted something happening, does not mean it’s not relevant if it . At the same time, you probably have other things to do than create alerts for anything that you know happen. Known knowns we can easily cover with X-Pack Alerting. It uses pre-defined boundaries of what is “OK” and what is “not OK.” It will respond in real-time to anything in the known known department. The other two need something more. Enter X-Pack Machine Learning. It will learn from history to predict the future, and tell you when something is not right, including the associated probabilities. It covers both known unknowns and unknown unknowns by looking holistically at all the data. At the same time, it’s so easy that a kid can use it, freeing up time from your people with the “sexiest jobs of the 21st century” (Harvard Business Review) to work on other, more complex challenges. Traces, or action-specific logsAnd of course, you’ll be able to dive into anything interesting or suspicious. Across your infrastructure and application stack. Any information that you have made available about a single request (or many!) is right in front of you. We would love to talk to you some more, but our time is running out. Some good reads on IT Operations Analytics with the Elastic Stack: Finding Bad GuysIn security as well as with IT operations, we can utilize X-Pack Machine Learning to find unusual patterns in all of our data, with quantitative and qualitative algorithms. Should we spot suspicious activity, we can utilize X-Pack Graph to create connections between data points and traverse the logs using algorithms that put the first, the popularity second. This is an excellent way to weed out noise that could otherwise interfere with our observability. We would love to talk to you some more, but this blog is not meant as a comprehensive discussion of how to do API Security Analytics. Luckily, such resources already exist. Some good reads on Security Analytics with the Elastic Stack: Expand Your Horizon with APMAdding APM (Application Performance Monitoring) to the Elastic Stack is a natural next step in providing our users with end-to-end monitoring, from logging, to server-level metrics, to application-level metrics, all the way to the end-user experience in the browser or client. It allows for more visibility of the operations of your APIs. APM is currenty in Alpha and hence not quite ready for production today. However, as new, exciting innovations go, it’s worth taking a look at it today! Some good reads on the upcoming APM module of the Elastic Stack:
Getting Started with the Elastic Stack on Microsoft Azure;Getting Started with Elasticsearch and the Elastic Stack on Microsoft Azure;/blog/getting-started-with-elasticsearch-and-the-elastic-stack-on-microsoft-azure;Christoph Wurm;October 20, 2017;Engineering;; As cloud adoption grows, we’re keeping pace at Elastic, developing integrations and making it easier to use our software wherever you are. These days, a number of our users are using Microsoft Azure for their deployments. For example, you can read about . Microsoft itself is an Elastic user in its own right. Elasticsearch is - a Top 50 website, and social network Yammer is . There are several ways to run the components of the Elastic Stack (Elasticsearch, Kibana, Logstash, and Beats) - and to ingest data from various applications and Azure components. Installing Elastic from the Azure Marketplace The easiest way to get started is to use the . We’ve partnered with Microsoft to create a configurable, UI-driven deployment template that you can use to create an Elasticsearch cluster with a Kibana instance on top. You can find more information on this in . And you’ll find the template itself . Let us know at if you have any questions or feedback. Ingesting Data using Beats and Logstash As the primary ingest technologies for the Elastic Stack, you can use both these components to get your data into Elasticsearch on Azure. is a data collection framework with implementations for many common data types: Log files, system and application metrics, network data, audit logs, or Windows Event data. You can view a list of official Beats , and a curated list of community-developed Beats . In contrast to the Beats which collect data from the source, Logstash is commonly used to receive data from the Beats for further processing - or to pull data from intermediary systems. Logstash supports a number of data sources, e.g. Syslog, various message queues like Kafka or Redis, and a number of Azure-specific data sources. And, of course, it can receive data from any Beat for further processing and enrichment. See for a list of input plugins and for a list of data transformations that are supported out of the box. Oct 25: Join Us Virtually at Azure OpenDev! Join us online for the next edition of Azure OpenDev where we will dive deeper into this. This is a live community-focused series of technical demonstrations centered around building open source solutions on Azure. On , we will be presenting together with , , , and on how to build a DevOps pipeline and bring an enterprise app to the Azure cloud. .
Alibaba Cloud to Offer Elasticsearch, Kibana, and X-Pack in China;;/blog/alibaba-cloud-to-offer-elasticsearch-kibana-and-x-pack-in-china;Shay Banon;October 13, 2017;News;zh-chs; Heya (Ni Hao) to Aliyun customers … Today, we're announcing a multi-year collaboration and strategic partnership with Alibaba Cloud to offer a new service called . This includes Elasticsearch, Kibana, and all features hosted on Alibaba Cloud, deployable by customers with a just few clicks from Alibaba's site. By collaborating with Alibaba Cloud, we'll be able to provide Alibaba's customers with the latest versions and features of the Elastic Stack (formerly known as the ELK Stack) and X-Pack and work together with Alibaba Cloud to build and launch new services such as logging. It's been an absolute privilege for me to be in Hangzhou at Alibaba's to make this important announcement in front of thousands of developers, startup entrepreneurs, and IT professionals. We view China as an important market and we love the pace of innovation that is happening all across the country with our software in gaming apps, mobile apps, web apps, and within traditional IT systems. While this super exciting for us, I'd like to recognize the wonderful community that's been built in China. It's these users who have contributed to putting Elasticsearch on the map. In the last two years, we've done over 50 meetups and developer events all across China. We've been to Beijing, Shanghai, Guangzhou, Shenzhen, Hangzhou, and many more cities. Our community is more than 5,000 users, and keeps growing daily. And this was all made possible by our first hire in China, Medcl Zeng, our engineering evangelist, and many volunteers who have helped us along the way. Alibaba Cloud team, thank you (Xie Xie) for a great week in China. It's going to be an exciting next few years for us. Simon Hu, President of Alibaba Cloud and Shay Banon, CEO of Elastic Shay Banon, CEO of Elastic, announces partnership with Alibaba Cloud at The Computing Conference 2017
Lexer's upgrade to Elasticsearch 5.4.1 improved search speeds by 30-40%;Lexer's upgrade to Elasticsearch 5.4.1 improved search speeds by 30-40%;/blog/lexers-upgrade-to-elasticsearch-5-4-1-improved-search-speeds-by-30-40;Chris Scobell;October 11, 2017;User Stories;; Scaling upLexer provide the data, tools and team to help companies genuinely understand and engage current and prospective customers. Elasticsearch helps us process large volumes of data and present it to our clients for insight and action. When we first began using Elasticsearch, our was managing around one million pieces of social content per day. Today that figure has climbed to 30 million per day, so upgrading towards a more robust cluster running Elasticsearch 5.4.1 was a crucial step in building a scalable product going forward. Elasticsearch 5.4.1 features dramatically improved indexing performance making it faster to get new data into the system. Plus, it comes with a new default scripting language () that is . Resilience is also a key focus: searches keep running even if hardware fails, or someone gets greedy with a huge complicated search. Needless to say, we were pretty eager to move in, so the first step was to work out how we were going to pack up the boxes in preparation for a full-scale migration. Preparing for migrationThe first step for us was moving our data into smaller indexes. We moved from a single index containing 2.8 billion tweets, comments, messages, articles and blogs to 90 smaller indexes of about 30 million objects each. These new indexes made the process of migration much more streamlined, and, more broadly, allowed our clients to make more efficient requests within Lexer. The next step was to ensure our searches were compatible with Elasticsearch 5.4.1 by updating our query generation library to ensure the queries generated in the interface (i.e. term matches or author searches) would work on the new search cluster. Here’s an example of the type of queries that had to be migrated, and and or filters which had to be translated into bool queries using must and should: Old Query { query: { filter: { and: [ { or: [ { query: { query_string: { query: car AND (blue OR red), default_field: content.content } } }, { query: { query_string: { query: bob, default_field: content.author } } } ] }, { query: { query_string: { query: facebook.com, default_field: content.source } } } ] } } } New Query: { query: { bool: { must: [ { query: { query_string: { query: facebook.com, default_field: content.source } } } ], should: [ { query: { query_string: { query: car AND (blue OR red), default_field: content.content } } }, { query: { query_string: { query: bob, default_field: content.author } } } ], minimum_should_match: 1 } } } We could translate these queries easily because we never stored Elasticsearch queries directly but instead store them in our own domain specific logical query structure. All we had to do was modify the library that translates our internal query format into Elasticsearch queries so that it would output queries that were accepted by Elasticsearch 5. So our concerns here were not so much with common queries like keyword searches but instead with the more complicated searches, like geolocation filters. Our interface allows users to draw a box on a map that is effectively 4 lat/long points, which is converted into an Elasticsearch query and run against the cluster. Here’s what this looks like in , our social analytics tool. In this example, the user is searching for people posting about the Australian Football League (AFL) in the vicinity of its biggest stadium, the Melbourne Cricket Ground. All of the charts and tables in this example are the result of Elasticsearch aggregations using the geo filter. The From Location map filter translates into an Elasticsearch query like this: { geo_bounding_box: { geography.point: { top_left: { lat: -37.8169206345321, lon: 144.978375434875 }, bottom_right: { lat: -37.8230227432016, lon: 144.987988471985 } } } } We needed to ensure that this query was converting the inputs from the user into something the cluster could understand, so we rigorously tested all of the query combinations and possibilities until we were 100% sure these queries worked against Elasticsearch 5.4.1 APIs. Pressing playWe used the method to move data from the old Elasticsearch 2.3.4 cluster to a new Elasticsearch 5.4.1 cluster. Elasticsearch provides a snapshot facility that can copy index data to one of many destinations (in our case we used Amazon S3) and then restore this data very quickly into a different cluster. After configurations were complete and we’d conducted a range of tests, we completed a full disaster recovery scenario, going from absolutely nothing to a brand new cluster with our complete 3.7B object data set within just 40 minutes. Going down the snapshot restore route we came away not only with an upgraded cluster, we also got a robust backup system that backs up our entire social dataset every 15 minutes as well as a quick and reliable disaster recovery scenario. A 30-40% performance increaseAfter performing the upgrade, the benefits were immediately obvious. Our general query performance improved 13% and some of our most frequently used features (those that show the context and history of a social object) saw a dramatic improvement of 30% to 40%. To put that into perspective, these features are currently used over 8,000 times a day and this translates to a time saving of 6 hours a day across all our clients! Looking forward We’ve been with Elastic from the very beginning, and are proud to cultivate a unique use case for their stack. Overall, we came away from this project with faster speeds for our clients, better processes for performing Elasticsearch cluster upgrades, a robust backup system, and a tested disaster recovery procedure. It also gave us lots of ideas to continue improving search performance, and making our cluster more efficient and scalable as Lexer takes on the world.
Brewing in Beats: Enhance data with Azure metadata;;/blog/brewing-in-beats-enhance-data-with-azure-metadata;Monica Sarbu;October 10, 2017;Brewing in Beats;; Welcome to ! With this weekly series, we're keeping you up to date with what's new in Beats, including the latest commits and releases. Enhance the event with Azure metadata Thanks to , the add_cloud_metadata processor gets for Azure. If configured, it enhances each event with . This is useful in case you want to see what logs came from a specific Azure instance or what metrics are associated with it. This feature is scheduled to be released in 6.1. Support for TLS renegotiation TLS renegotiation is disabled by default in the Go standard library (and therefore in Beats), because it significantly complicates the state machine and has been the source of security issues in the past. However, it is sometimes , so this adds support for enabling TLS renegotiation. The setting is and the options are (default), , and . The new setting can be used anywhere TLS is configured (the outputs, the http module in Metricbeat, Heartbeat, etc.). This feature will be available in 6.1 and is being ported to 5.6 as well. Work in progress: Autodiscovery We have started the work around autodiscovery in Beats. Once it’s finished, autodiscovery (the name still to be discussed) will allow for scenarios like: In general, auto discovery will allow the Beats to react and adapt to changes in the ever more dynamic infrastructures. See this for the general approach we take for the building blocks, as well as the first implementation for Docker. This work is scheduled to be released in the 6.x time frame. Participate to Hacktoberfest with Beats You can participate in the fourth annual Hacktoberfest by contributing to the Beats open source project. Just start by looking at the GitHub issues tagged with the hacktoberfest label. Other changesRepository: elastic/beatsAffecting all BeatsChanges in master: Changes in 6.0: MetricbeatChanges in master: PacketbeatChanges in 6.0: FilebeatChanges in master: Changes in 6.0: AuditbeatChanges in 6.0: DashboardsChanges in master: TestingChanges in master: Changes in 5.6: Changes in 6.0: DocumentationChanges in master: Changes in 5.6: Changes in 6.0:
Kibana Dashboard Only Mode;Kibana Dashboard Only Mode;/blog/kibana-dashboard-only-mode;Stacey Gammon;October 10, 2017;Engineering;; Ever wish you could share your Kibana dashboards without the risk of someone accidentally deleting or modifying them? Do you want to show off your dashboards without the distraction of unrelated applications and links? In version 6.0 we’re making it easier than ever to set up a restricted access user, with limited visibility into Kibana. It’s already possible to create read only users, but new in 6.0 is a UI to match, and we’ve made it simple to set up. All you have to do is assign the new, reserved, built-in role, along with the appropriate data access roles, to your user and they will be in when they log in to Kibana. The Experience When a user in Dashboards Only Mode first logs into Kibana, they will only see the Dashboard app in the left navigation pane. All edit and create controls will be hidden. When a dashboard is opened, they will also have a limited visual experience, with no add or edit controls. Dashboard only mode pairs well with full screen mode. Share your dashboards with the team responsible for putting them up on the “big screen”, and be confident they will remain safe and indestructible. How to Set it Up Under Management > Security > Users, edit or create a new user and assign them the role, along with roles that grant the user appropriate data access. Advanced Configuration The built-in role grants read only access to the .kibana index, so if you have a multi-tenant setup, or are using a custom kibana index, you’ll have to use an advanced configuration. You can do this by creating your own roles, and tagging them as “Dashboard only mode” in Advanced Settings. You can find this configuration in Management > Advanced Settings, called . By default this will be set to . Here you can add as many additional roles as you like. When creating your custom dashboard only mode roles, you should grant them read only access to your custom index. Roles are stored in Elasticsearch, but because Advanced Settings are stored in your kibana index, you will have to modify this setting for each custom index you are using. Try It Out Dashboard only mode is available in 6.0.0-RC-1, which you can download and try out here: . We’d love for you to try it out and give us any feedback you have. As an added bonus, by finding and filing new bugs, and be eligible for Elastic swag.
Kibana 5.6.3 released;;/blog/kibana-5-6-3-released;Jim Goodwin;October 10, 2017;Releases;; Hello, and welcome to the 5.6.3 release of Kibana! This release of Kibana includes an important enhancement to improve the experience when importing dashboards and visualizations from pre-5.4 releases of Kibana allowing you to choose an existing index pattern for references in dashboards and visualizations. Kibana 5.6.3 is available on our and on . Please review the for rest of the enhancements and bug fixes.
Elasticsearch 5.6.3 released;Elasticsearch 5.6.3 released;/blog/elasticsearch-5-6-3-released;Clinton Gormley;October 10, 2017;Releases;; Today we are pleased to announce the release of , based on . Elastisearch 5.6.3 is the latest stable release, and is already available for deployment on , our Elasticsearch-as-a-service platform.Latest stable release in 5.x:Please , try it out, and let us know what you think on Twitter () or in our . You can report any problems on the .
Keeping up with Kibana: This week in Kibana for October 2, 2017;;/blog/keeping-up-with-kibana-2017-10-2;Jim Goodwin;October 02, 2017;Kurrently in Kibana;; Welcome to This is a weekly series of posts on new developments in the Kibana project and any related learning resources and events. 7 is in 6.0.0-rc1! Plus, security levels up thanks to you, our early testers. See what else is new, just in time for weekend projects: — elastic (@elastic) V6.0.0 stabilization edition. These have no impact on existing v5 users.
Elasticsearch 6.0.0-rc1 released;Elasticsearch 6.0.0-rc1 released;/blog/elasticsearch-6-0-0-rc1-released;Clinton Gormley;September 28, 2017;Releases;; We are excited to announce the release of , based on . This is the fifth in a series of pre-6.0.0 releases designed to let you test out your application with the features and changes coming in 6.0.0, and to give us feedback about any problems that you encounter. Open a bug report today and become an . This is a pre-release and is intended for testing purposes only. Indices created in this version . Upgrading 6.0.0-rc1 to any other version is not supported. Also see:
Beats 6.0.0-rc1 released;;/blog/beats-6-0-0-rc1-released;Monica Sarbu;September 28, 2017;Releases;; Today we are excited to announce the first public release candidate for the Beats 6.0. The 6.0 release day is approaching! The release is primarily about bug fixes and performance improvements, but we do have a change we want to highlight. But before that, let’s get those handy links out of the way: Lower number of shards in default configurations Starting with 6.0, the number of shards and other Elasticsearch mapping templates can be changed directly from the Beats configuration files. This is convenient because it lets you easily adapt the size of the shards depending on how much data your Beats create. In RC1, we took this a step further and added explicit settings for the number of shards in the default configuration files. These settings overwrite the default number of shards that Elasticsearch has, which is 5. With this release we changed the default number of shards to 1 for the Beats that create metrics (Metricbeat and Heartbeat) and 3 for the Beats that create events (Filebeat, Winlogbeat, Auditbeat, and Packetbeat). This change should result in a lower amount of shards created for a small to medium installation using the default configuration. If you have lots of Beats, or for other reasons expect high indexing throughput, you should consider increasing the number of shards in the Beats configuration. Become a Pioneer A big “Thank You” to everyone that has tried the alpha or beta releases and posted issues or provided feedback. We’d like to also remind you that if you post a valid, non-duplicate bug report during the pre-GA period against any of the Elastic stack projects, you are entitled to a special gift package. You can find more details about the .
Elastic APM enters alpha;;/blog/elastic-apm-enters-alpha;Ron Cohen;September 28, 2017;Releases;; We’re very excited to announce that Elastic APM is now in alpha. It is open source and brings support for monitoring Node.js and Python applications. Since Opbeat became part of Elastic, the team has been very hard at work reworking central components from opbeat.com to make them work on the Elastic Stack. This means Elastic APM is available for you to and try out on your local machine today. ArchitectureElastic APM consists of four components: are open source libraries written in the same language as your application. You install them into your application as you would install any other library. is an open source application written in Go which runs on your servers. It listens on port by default and receives data from agents periodically. To visualize the data after it's sent to Elasticsearch, you can use the pre-built Kibana Dashboards that come with APM Server. Later this year the UI will get a massive upgrade when we release a dedicated APM UI, similar to the intuitive and easy-to-use interface that is known from Opbeat today. The UI will be delivered as a Kibana plugin and will be included in the Beta release. Try it today! The alpha release that we’re making available today has been running on different sites for some time. While this software is alpha level, we encourage you to try it out - we’re always looking for feedback from users! Getting started with Elastic APM is straightforward: For more detailed getting started guide, follow the . Feedback wantedPlease don’t hesitate to get in touch if you run into any issues or if you have ideas for improvements. You can ask question on the or open issues directly on the . Finally, we’re also keen to know more about your stack to help inform our continued development. Please help us by answering this .
Elastic Stack 6.0.0-rc1 released;;/blog/elastic-stack-6-0-0-rc1-released;Tyler Hannan;September 28, 2017;Releases;; 6.0.0-rc1 is available today! And, because multiple releases on the same day are amazing, we also . Not only are we approaching, rapidly, the GA of 6.0.0, we are super excited that this release includes the alpha release of . Although it is in alpha, it has been running on multiple sites for some time and it wouldn’t be a proper dogfooded release if we didn’t sip the champagne while mixing our metaphors. If you want more history on why Elastic APM, is a great place to start . During the 5.0 release, we introduced the Elastic Pioneer Program and are continuing the with the 6.0 preview releases. Before you get too excited, keep in mind that this is still a release candidate so don’t put it into production. There is no guarantee that any of the 6.0.0 pre-release versions will be compatible with other pre-releases, or the 6.0.0 GA. We strongly recommend that you keep this far, far away from production. But we also recommend that you . That’s correct, now that we have a release candidate it is available on your favourite hosted Elasticsearch provider. (Or will be by the time you read this…). Elasticsearch For more detailed information, peruse the Elasticsearch . Kibana Visualize the future of your interacting with your data in the Kibana . Logstash Grok the details of rc1 in the Logstash . Beats We don’t ‘let the beat drop’ but we drop the updates in a . Get It Now!
Announcing the GA of Elastic Cloud on Google Cloud Platform (GCP), More Options to Host Elasticsearch;;/blog/announcing-the-ga-of-elastic-cloud-hosted-elasticsearch-on-google-cloud-platform-gcp;Uri Cohen;September 28, 2017;Releases;de-de,fr-fr,ja-jp; Great things happen in pairs, especially when the company who popularized the search box for the Internet partners with the world’s most popular open source, distributed search engine. This happened in April when to offer our hosted Elasticsearch product on Google Cloud Platform (GCP). Less than six months after this announcement, is now GA in four regions: US West (Oregon), US Central (Iowa), Europe (Belgium), and Europe (Frankfurt). As Google’s cloud business continues to rapidly grow and more and more developers adopt Elastic’s products for mission critical use cases like search, logging, security, metrics, and analytics, users can now deploy, manage, and scale their Elasticsearch clusters on GCP with a few clicks. Based on what we continually hear from our users, they want to use a hosted Elasticsearch service created by Elastic, not to be . Elastic Cloud is the only product on the market that comes with , support provided by Elastic technical engineers, and many other features like one-click upgrades, snapshots every 30 minutes, custom plugin support, Elastic’s popular map service for Kibana geo-visualizations, and a comprehensive set of out-of-the-box monitoring metrics and tools. With Google, we get an incredible partner and a powerful cloud platform to offer Elastic Cloud on. Our users and customers now get more choices for where to run their Elasticsearch and Kibana workloads. A Standard subscription to Elastic Cloud on GCP starts at $45/month and users can freely upgrade to . In the future, we look forward to expanding into more GCP regions and adding additional features and use cases. Spin up your now!
Brewing in Beats: Add openSUSE support;;/blog/brewing-in-beats-add-opensuse-support;Monica Sarbu;September 28, 2017;Brewing in Beats;; Welcome to ! With this weekly series, we're keeping you up to date with what's new in Beats, including the latest commits and releases. Official support for openSUSE 42 We have an openSUSE 42 VM to our packaging tests in , so we can officially support it starting with version 6.0. Note that this doesn’t change our stance on SUSE or openSUSE 11, which is that we provide binaries but they are not officially supported. The main reason is that while openSUSE has switched to systemd, SUSE still uses init scripts, and the init script that we ship with the RPMs doesn’t work on SUSE out of the box. Audibeat: Kibana 5.x version for the dashboards Auditbeat contains dashboards compatible with Kibana 5.x, so that it’s convenient to use with the previous version of the stack as well. This change didn’t make it for 6.0.0-rc1, but we intend to include it in 6.0.0-rc2. Other changes: Repository: elastic/beats Affecting all Beats Changes in master: Metricbeat Changes in master: Filebeat Changes in master: Testing Changes in master: Changes in 5.6: Documentation Changes in master: Changes in 5.6: Changes in 6.0:
TLS for the Elastic Stack: Elasticsearch, Kibana, Beats, and Logstash;;/blog/tls-elastic-stack-elasticsearch-kibana-logstash-filebeat;Jared Carey;September 27, 2017;Engineering;; Transport Layer Security (TLS) can be deployed across the entire Elastic Stack, allowing for encrypted communications so you can rest easy at night knowing that the data transmitted over your networks is secured. It may not seem all that necessary, but then again consider the impossible situation of making sure that no developer starts logging sensitive data into the logs that you are shipping to a central location. Sensitive data is what most would believe to be passwords, customer's personal information, etc. However, this definition of sensitive data is far too narrow for the era of cyber security that we live in. Imagine a compromised router that allows an attacker to peer into the raw unencrypted data on the network, where seeing the logging data could provide the software and operating system versions of all the software being used on a network. This provides every detail necessary for the attacker to look up known software vulnerabilities that could allow the attacker to gain direct access to these servers. The security of an entire organization hinges on the weakest link, and in today's world of cybersecurity attacks - don't let your logging / search system be that weakest link. 6.x will require Elasticsearch TLS node to node communication when using X-Pack security for a multi node cluster. Read for more details. This blog will guide you through the process of setting up and configuring TLS to keep all data private from Filebeat -> Logstash -> Elasticsearch -> Kibana -> your web browser. You'll need a system that has some memory available in order to run each of these, as you will be setting up two Elasticsearch nodes (1gb memory per node by default, starting the second node is optional if you need to conserve some memory), one Logstash server (1gb memory by default), one Kibana server (~200mb memory), and one Filebeat (~10mb). You will likely need 6gb total system memory, but 8gb would be ideal since I have no way of telling what other software or memory hungry browser (with 50 tabs open) you are running. Understanding TLS / CertsFirst we should start with some of the fundamentals, by discussing what a certificate is and how it will be used. A certificate holds the public information, including the public key, that helps encrypt data between two parties: where only the party with the matched private key could decrypt data from an initial handshake. Without wading too far into the details, the public and private key represent a hard-to-compute but easy-to-verify computational puzzle, where the private key holds very large numbers that allow for easily solving this computational puzzle. The hard-to-compute means the math necessary to try to solve this computational puzzle, in modern public key cryptography, would take thousands of years with current compute resources. The private key holds the source numbers necessary to easily verify the puzzle, and the certificate is generated from the private key to contain the necessary inputs to this math puzzle. Additionally, the certificate will contain a form of identities, known as the Common Name (CN). Throughout the examples to follow, I will use a server's DNS name for the identity. All public parts of the puzzle and the identity of a certificate are first created through generating a Certificate Signing Request (CSR). The CSR allows to request for identity to be inserted and signed by an authority, at which point the certificate is created. The certification / verification of the certificate is important, since the handshake that will take place between the client and server requires the client trusting the signing authority of the server's certificate. The client must either trust the server's certificate directly or it should be signed by an authority that the client trusts. For example, your OS / browser has a preset list of certificates that are publicly trusted. When you visit google.com, this certificate is signed by a chain of authorities that can create a path back to the trusted cert for your client OS / browser. Think of it like a notary for signing legal documents, the notary must be certified somehow in order to be recognized. A certificate authority, also known as a root authority, in the simplest form is self signed - meaning the certificate generated was signed by using its own private key. Publicly trusted authorities have very strict standards and auditing practices to ensure that a certificate is not created without validating proper identity ownership. If you wanted to generate a certificate for http://elastic.example, you would need to prove ownership of this domain name / identity first. Inside a certificate, the Subject contains a Distinguished Name (DN) which at minimum will have the Common Name (CN). The CN can be set to anything, and the Subject Alternative Name (SAN) should be used to specify the proper identity such as DNS name and/or IP addresses. The SAN can also contain multiple DNS and/or IP addresses. This identity will be used for client verification such that the server's certificate properly matches the identity that is present in the certificate. The client will verify this by resolving the DNS name while trying to establish a new connection. You might at this point wonder how all the communications could be encrypted when only the server would have the information to decrypt. This is where the TLS handshake comes in, which is best explained with the picture below. To start this handshake, the client must make a request to connect to the server (1). The server then responds with a certificate (2), which the client is then responsible for validating / trusting the server's identity as mentioned earlier (3). Optionally, the server can also ask for a client certificate if you wanted the additional security where the server must also trust the client (4). Using the public key of the server the client creates a new symmetric key, the shared secret key, which only the client has knowledge of at this point. It encrypts this shared key using the public key of the server, which allows the server to receive this new shared key and decrypt (5). At this point the shared key is now only known to the client and server (7)(8), and can be used for encrypting and keeping the traffic private between the two parties (9). Now, let's put this knowledge into practice by setting up the Elastic Stack with TLS encrypted communication between each product. This setup will not cover all the features and settings, just the minimum to apply TLS encryption. Setup & Download To make things easier, we should setup a directory to be used throughout this blog. I'll be storing this in my user's home directory. The option creates the folder if it does not already exist in your home directory. $ mkdir -p ~/tmp/cert_blog Beats, Logstash, and Kibana have TLS support in the open source product. Elasticsearch requires our commercial plugin, X-Pack, for TLS and other security features. X-Pack security provides authentication and authorization control to prevent access to indices, documents, and even fields within documents. X-Pack also provides alerting, monitoring, reporting, graph exploration, machine learning, and support! See for more details. Download the latest preview release for the products below. Select the zip or tar file format, and place these in the folder we created: (You can setup TLS with the 5.x Elastic Stack as well. The only major change in the instructions that follow is that you will not generate passwords for the built in elasticsearch user, since these shipped with default passwords set) Extract each product into this same folder. From here forward when I refer to the {product} folder, this will be the extracted product folder location. For example, the elasticsearch folder for me is Elastic certgen tool One amazing feature that X-Pack adds is a . Openssl can be used to generate and sign certificates: but it can be hard for even experienced users to use, and can lead to countless hours of frustration trying to insert common items like a Subject Alternative Name (SAN). makes it easy to generate the necessary certificates and even a signing authority. It can even be used to create CSRs if you intend to have your certs signed by a public or corporate / internal signing authority. To access the tool, we must first install Elasticsearch X-Pack. $ cd ~/tmp/cert_blog/elasticsearch-6.0.0-beta2 $ bin/elasticsearch-plugin install x-pack Create a Certificate Authority / Signing Authority Encrypting a private key with a password is a good practice, especially if it will be used to sign other certificates. Let's create this password that will be used for encrypting the certificate authority's private key. You could use whatever you would like here or use something like the command below to generate a strong password. Be sure to safely save whatever password you choose, as it is impossible to recover and you will need it in order to sign certificates. $ openssl rand -base64 32 <long complex password> Go into the elasticsearch folder $ cd ~/tmp/cert_blog/elasticsearch-6.0.0-beta2 In this next step, we will just create the root authority that we will use for signing other certificates. Use whatever name you would like (but retain the part). In the example below the generated public certificate will have a lifespan of 10 years, and will generate a large private key with 4096 bits. When prompted, you will need to enter the password that you selected or generated. Hit enter to skip the subsequent instance name questions, since we don't want to create any more server certificates. $ bin/x-pack/certgen --dn 'CN=MyExample Global CA' --pass --days 3650 --keysize 4096 --out ~/tmp/cert_blog/MyExample_Global_CA.zip ... Enter password for CA private key: Enter instance name: Would you like to specify another instance? Press 'y' to continue entering instance information: At some point, you will want to check out the for all possible settings and usage, but I'll provide all the commands necessary to get through this exercise. You should now have a zip file that contains your root certificate authority's private key and public certificate. Unzip this file, but keep in mind we will only be distributing the file. The file should be stored away for safe keeping (along with the password from earlier that is needed to decrypt it). $ cd ~/tmp/cert_blog $ unzip MyExample_Global_CA.zip Archive: MyExample_Global_CA.zip creating: ca/ inflating: ca/ca.crt inflating: ca/ca.key We can inspect the details of this new certificate with openssl. You'll notice the , which indicates the certificate is self signed. The extensions section can contain information like the SAN or fingerprints that help identify the current or signing certificate. The basic constraint is important, shows that it can be used to sign other certificates. $ openssl x509 -noout -text -in ca/ca.crt ... Issuer: CN=MyExample Global CA Validity Not Before: Sep 24 19:42:40 2017 GMT Not After : Sep 22 19:42:40 2027 GMT Subject: CN=MyExample Global CA ... X509v3 extensions: X509v3 Subject Key Identifier: 8F:6C:8B:20:B3:7A:D9:18:31:9B:99:CC:8C:93:25:98:75:F4:4B:60 X509v3 Authority Key Identifier: keyid:8F:6C:8B:20:B3:7A:D9:18:31:9B:99:CC:8C:93:25:98:75:F4:4B:60 DirName:/CN=MyExample Global CA serial:0C:0B:14:99:98:D6:7B:64:0D:00:03:64:B8:1F:7D:F7:9F:BF:6F:30 X509v3 Basic Constraints: critical CA:TRUE ... Generate the server certificates Create a new file This example will generate the public cert and private key for two elasticsearch nodes, kibana, and logstash: and usage of these certificates will require the DNS name to be properly set up. For testing purposes, we can edit so these DNS names will be valid. instances: - name: 'node1' dns: [ 'node1.local' ] - name: node2 dns: [ 'node2.local' ] - name: 'my-kibana' dns: [ 'kibana.local' ] - name: 'logstash' dns: [ 'logstash.local' ] In the next command we will use the yaml file created above and generate certificates for each instance that will be valid for 3 years (use whatever period you are comfortable with, just keep in mind that when a certificate expires - it will need to be replaced). We must specify the cert and key for the signing / root authority that we created earlier, and the option will prompt for the password necessary to decrypt the signing authority's private key. $ cd ~/tmp/cert_blog/elasticsearch-6.0.0-beta2 $ bin/x-pack/certgen --days 1095 --cert ~/tmp/cert_blog/ca/ca.crt --key ~/tmp/cert_blog/ca/ca.key --pass --in ~/tmp/cert_blog/certgen_example.yml --out ~/tmp/cert_blog/certs.zip Unzip the created file $ cd ~/tmp/cert_blog $ unzip certs.zip -d ./certs Archive: certs.zip creating: ./certs/node1/ inflating: ./certs/node1/node1.crt inflating: ./certs/node1/node1.key creating: ./certs/node2/ inflating: ./certs/node2/node2.crt inflating: ./certs/node2/node2.key creating: ./certs/my-kibana/ inflating: ./certs/my-kibana/my-kibana.crt inflating: ./certs/my-kibana/my-kibana.key creating: ./certs/logstash/ inflating: ./certs/logstash/logstash.crt inflating: ./certs/logstash/logstash.key Inspecting the certificate for node1, you will notice the issuer / signing authority is . The Subject = the name we provided in the yaml, and the SAN has the proper DNS name. We are all set. $ openssl x509 -text -noout -in certs/node1/node1.crt ... Issuer: CN=MyExample Global CA Validity Not Before: Sep 24 21:42:02 2017 GMT Not After : Sep 23 21:42:02 2020 GMT Subject: CN=node1 ... X509v3 extensions: X509v3 Subject Key Identifier: A0:26:83:23:A8:C6:FB:02:F3:7F:C9:BC:1A:C9:16:C9:04:62:3E:DE X509v3 Authority Key Identifier: keyid:8F:6C:8B:20:B3:7A:D9:18:31:9B:99:CC:8C:93:25:98:75:F4:4B:60 DirName:/CN=MyExample Global CA serial:0C:0B:14:99:98:D6:7B:64:0D:00:03:64:B8:1F:7D:F7:9F:BF:6F:30 X509v3 Subject Alternative Name: DNS:node1.local X509v3 Basic Constraints: CA:FALSE ... To test with these certificates, we need the DNS names to resolve. We can modify for testing, but in production you should have proper DNS set up. The line for 127.0.0.1 / localhost in should look something like this: 127.0.0.1 localhost node1.local node2.local kibana.local logstash.local Elasticsearch TLS setupCreate a cert directory in the elasticsearch config folder $ cd ~/tmp/cert_blog/elasticsearch-6.0.0-beta2 $ mkdir config/certs We will be starting two nodes, so we need to create a second config folder $ cp -r config config2 Copy in the ca.crt, and the node's private key and public cert. $ cp ~/tmp/cert_blog/ca/ca.crt ~/tmp/cert_blog/certs/node1/* config/certs $ cp ~/tmp/cert_blog/ca/ca.crt ~/tmp/cert_blog/certs/node2/* config2/certs Configuring the elasticsearch nodes: edit node.name: node1 network.host: node1.local xpack.ssl.key: certs/node1.key xpack.ssl.certificate: certs/node1.crt xpack.ssl.certificate_authorities: certs/ca.crt xpack.security.transport.ssl.enabled: true xpack.security.http.ssl.enabled: true discovery.zen.ping.unicast.hosts: [ 'node1.local', 'node2.local'] node.max_local_storage_nodes: 2 edit node.name: node2 network.host: node2.local xpack.ssl.key: certs/node2.key xpack.ssl.certificate: certs/node2.crt xpack.ssl.certificate_authorities: certs/ca.crt xpack.security.transport.ssl.enabled: true xpack.security.http.ssl.enabled: true discovery.zen.ping.unicast.hosts: [ 'node1.local', 'node2.local'] node.max_local_storage_nodes: 2 You will notice in the config above that is set to the DNS name. network.host is a shortcut for setting both and . The bind_host controls which interfaces elasticsearch will be available on, and the publish_host is how we tell other nodes they should communicate with this node. This is important, since we want other nodes to connect using the proper DNS name set in the certificate, or they will reject the connection due to an identity mismatch. Additionally, discovery uses DNS names, since this list is used by the node at the initial startup phase for contacting one of these hosts for discovering / joining the cluster. When this new node can join, the discovery node returns the list of all the nodes currently in the cluster (which is where the publish_host comes into play). Startup the first node $ ES_PATH_CONF=config ./bin/elasticsearch Open a new terminal window, go to the elasticsearch folder, and start the second node $ ES_PATH_CONF=config2 ./bin/elasticsearch We need to configure passwords for the various system accounts. Make sure both nodes start properly before continuing. You should see something similar to this log line: [2017-09-24T21:13:43,482][INFO ][o.e.n.Node ] [node2] started With a new terminal window, go to the elasticsearch folder $ cd ~/tmp/cert_blog/elasticsearch-6.0.0-beta2 $ bin/x-pack/setup-passwords auto -u https://node1.local:9200 Initiating the setup of reserved user [elastic, kibana, logstash_system] passwords. The passwords will be randomly generated and printed to the console. Please confirm that you would like to continue [y/N]y Changed password for user elastic PASSWORD elastic = #q^4uL*tIO@Sk~%iPwg* Changed password for user kibana PASSWORD kibana = %uhWtQCN-9GNa52vot_h Changed password for user logstash_system PASSWORD logstash_system = #3vs5PZDBrWTIVnCgOCh Save these passwords! Now let's see that both nodes are listed in the cluster correctly (hint: add to the end of the URL to get the column names, see ) $ curl --cacert ~/tmp/cert_blog/ca/ca.crt -u elastic 'https://node1.local:9200/_cat/nodes' 127.0.0.1 42 100 14 1.91 mdi * node1 127.0.0.1 39 100 14 1.91 mdi - node2 Let's send that request to the second node that should be running on port . curl --cacert ~/tmp/cert_blog/ca/ca.crt -u elastic 'https://node1.local:9201/_cat/nodes' curl: (51) SSL: certificate verification failed (result: 5) Ah, we only changed the port and not the DNS name. The curl client did not allow this since the server's identity did not match the certificate it was presenting. Let's correct that and try again using the correct DNS name for the second node. $ curl --cacert ~/tmp/cert_blog/ca/ca.crt -u elastic 'https://node2.local:9201/_cat/nodes' 127.0.0.1 20 100 24 2.04 mdi - node2 127.0.0.1 43 100 24 2.04 mdi * node1 We now have a working two node elasticsearch cluster. Keep in mind that two nodes is great for some quick testing, but for anything beyond a quick test it is imperative to properly set the setting to when using two nodes. Kibana TLS setupFrom the kibana folder, install x-pack $ bin/kibana-plugin install x-pack This step will take a couple of minutes. Go grab a drink, you earned it. Next, create a cert folder in the config directory and copy in the certs. $ mkdir config/certs $ cp ~/tmp/cert_blog/ca/ca.crt ~/tmp/cert_blog/certs/my-kibana/* config/certs Edit. Make sure to insert the correct user's password that was generated earlier. server.name: my-kibana server.host: kibana.local server.ssl.enabled: true server.ssl.certificate: config/certs/my-kibana.crt server.ssl.key: config/certs/my-kibana.key elasticsearch.url: https://node1.local:9200 elasticsearch.username: kibana elasticsearch.password: %uhWtQCN-9GNa52vot_h elasticsearch.ssl.certificateAuthorities: [ config/certs/ca.crt ] Start up kibana $ bin/kibana Once kibana has fully started, visit in your web browser. You should get an error that the certificate is not trusted. This is expected since neither the direct certificate nor the signing authority is trusted by the browser. You can add / trust the newly created certificate authority to your OS / browser, but these steps can vary depending upon the OS / browser that you use. I'll leave that for you and google to figure out. Dismiss / continue past the certificate error for now, and login with the user and the auto-generated password. Once logged in, click on the monitoring tab - and you should see an overview that elasticsearch has 2 nodes and kibana has 1 instance. We now have elasticsearch and kibana communications encrypted, and the certs are fully verified using DNS. Before moving on though, let's use the UI to setup an account that logstash can use to write to elasticsearch with. Click on the management tab We will need to setup a role that will grant the necessary permissions that are needed for our logstash configuration. Click on Roles Then click the Create role button Create the role as pictured below and click save. Now we will assign this role to a new user. Click on Users tab Click the Create user button Fill in all the details pictured below. The email can be whatever you would like, it is not used beyond just having a contact record in elasticsearch. Assign the newly created and click save. Logstash TLS setupX-Pack is not necessary to setup TLS for logstash, but we will install / use it since it will allow us to view logstash information in the Kibana monitoring UI - which is awesome. From the logstash folder run $ bin/logstash-plugin install x-pack We need to create a certs directory in the config folder, and copy in the certificates. $ mkdir config/certs $ cp ~/tmp/cert_blog/ca/ca.crt ~/tmp/cert_blog/certs/logstash/* config/certs The logstash-input-beats plugin requires the private key to be in the pkcs8 format. The following openssl command below will make a new file in the pkcs8 format. $ openssl pkcs8 -in config/certs/logstash.key -topk8 -nocrypt -out config/certs/logstash.pkcs8.key Edit. Make sure to insert the correct auto-generated password for the user. node.name: logstash.local xpack.monitoring.elasticsearch.username: logstash_system xpack.monitoring.elasticsearch.password: '#3vs5PZDBrWTIVnCgOCh' xpack.monitoring.elasticsearch.url: https://node1.local:9200 xpack.monitoring.elasticsearch.ssl.ca: config/certs/ca.crt Create . For the elasticsearch output config, the user and password will use the account that you just created in the kibana UI. input { beats { port => 5044 ssl => true ssl_key => 'config/certs/logstash.pkcs8.key' ssl_certificate => 'config/certs/logstash.crt' } } output { elasticsearch { hosts => [https://node1.local:9200,https://node2.local:9201] cacert => 'config/certs/ca.crt' user => 'logstash_writer' password => 'changeme' index => 'logstash-%{+YYYY.MM.dd}' } } Start logstash with the example configuration. $ bin/logstash -f config/example.conf After it is up and running, visiting the Kibana Monitoring page will now have a logstash section with 1 node and 1 pipeline! Filebeat TLS setupFrom the Filebeat folder, create a certs directory, and copy in the the CA cert. We only need the signing authority, as Filebeat will only be a client talking to Logstash server. You could configure Filebeat to also provide a client certificate if you wanted a form of mutual auth, but that is a topic for another day. $ mkdir certs $ cp ~/tmp/cert_blog/ca/ca.crt certs We need a test log to configure Filebeat to read. If you already have a log file somewhere, you can skip this step and just put in the correct path to that log file. If not, and unpack into the Filebeat directory. Create: filebeat.prospectors: - type: log paths: - logstash-tutorial-dataset output.logstash: hosts: [logstash.local:5044] ssl.certificate_authorities: - certs/ca.crt Then run Filebeat with this config $ ./filebeat -e -c example-filebeat.yml Now, you can visit the Kibana discover page, and click the Create button for a new index pattern. The index pattern is used for selecting logstash-* named indices for searching on. We should now have log data in the kibana UI! This data was transmitted fully encrypted from Filebeat -> Logstash -> Elasticsearch. Kibana pulled it from Elasticsearch encrypted and transmitted it to your browser encrypted. Huzzah!
Logstash Lines: More Bundled Plugins and Testing Improvements;;/blog/logstash-lines-2017-09-27;Andrew Cholakian;September 27, 2017;The Logstash Lines;; Welcome back to The Logstash Lines! In these weekly posts, we'll share the latest happenings in the world of Logstash and its ecosystem.We've finished transitioning logstash core away from Travis to our own Jenkins infrastructure. Being on fast, stable hardware has really improved the reliability of our tests. Additionally, this move has reduced our PR test time from 45 minutes to often under 10 minutes per commit. We now split up some of the work among multiple jobs.Another part of this work has been stabilizing tests. The tests for Logstash are green more often now than they have been in a long time: all while the amount and coverage of testing in Logstash has grown astronomically. In Logstash 6+ all commercially supported plugins will now be part of the package. This has been a problem historically for users in air-gapped environments, who would need to go through the cumbersome process of downloading plugins that are part of our supported list. (#8318). The added plugins are: the aggregate filter, anonymize filter, de-dot filter, elasticsearch filter, jdbc_streaming filter, truncate filter, email output, and lumberjack output.We've continued to make strides improving our windows support. We've been steadily making our Windows experience more consistent and correct, as well as improving test support in a variety of PRs.
Machine Learning for Nginx Logs - Identifying Operational Issues with Your Website;;/blog/machine-learning-for-nginx-logs;Steve Dodson;September 26, 2017;Engineering;; Getting insight from nginx log files can be complicated. This blog shows how machine learning can be used to automatically extract operational insights from large volumes of nginx log data. Overview Data science can be a complicated, experimental process where it is easy to , or the . Therefore, a key design goal for the Machine Learning group at Elastic is to develop tools that empower a wide spectrum of users to get insight out of Elasticsearch data. This lead to us to develop features such as “” and “” wizards in X-Pack Machine Learning, and we are planning to simplify analysis and configuration steps even more in upcoming releases. In parallel to these wizards, we are also planning to shrink-wrap job configurations on known Beats and Logstash data sources. For example, if you are collecting data with the , we can provide a set of shrink-wrapped configurations and dashboards to help users apply machine learning to their data. These configurations are also aimed at showing how we develop Machine Learning configurations internally based on our experience. Help us prioritize the next set of modules that should include preconfigured machine learning jobs by . The details of how to install these configurations will be covered in a subsequent blog. This blog is aimed at describing the use cases and configurations. Use Case Notes The configuration options for X-Pack Machine Learning are extensive, and often new users are tempted to start with complex configurations and select large numbers of attributes and series. These types of configurations can be very powerful and expressive, but require care as the results can be difficult to interpret. We therefore recommend that users start with simple, well-defined use cases, and build out complexity as they become more familiar with the system. (Note, often the best initial use cases come from automating anomaly detection on charts on the Operations teams core dashboards.) Example Data Description The data used in these examples is from a production system consisting of 4 load balanced nginx web servers. We analysed 3 months data (~29,000,000 events, ~1,100,000 unique visitors, ~29GB data). Note, the data shown here has been anonymised. nginx : Sample log message: Once processed by Filebeat’s NGINX module configuration, we get the following JSON document in Elasticsearch: Use Case 1: Changes in Website Visitors Operationally, system issues are often reflected in changes in visitor rate. For example, if the visitor rate declines significantly in a short period of time, it is likely that there is a system issue with the site. Simple ways to understand changes in visitor rate are to analyse overall event rate, or the rate number of distinct visitors. Job 1.1: Low Count of Website Visitors This job can simply be configured using the ‘Single Metric Job’ wizard: Job configuration summary: This analysis shows a significant anomaly on February 27th where the total event rate drops significantly: (Note this analysis of the 29,000,000 events took a total of 16s on a m4.large AWS instance) Job 1.2: Low Count of Unique Website Visitors Event counts can be strongly influenced by bots or attackers, and so a more consistent feature to analyse the number of unique website visitors. Again this can simply be configured using the ‘Single Metric Job’ wizard: Again there is a significant anomaly on February 27th where the number of unique visitors per 15m drops from a typical 1487 to 86: Combining Job 1.1 and 1.2: Using the the results from both jobs can be temporary correlated to give an ‘Overall’ view into the anomalousness of the system based on these features: This clearly shows in a single view, that there was a significant anomaly on February 27th between 10:00-12:00 where the total event rate dropped, and the number of unique visitors dropped. The operations team confirmed the site had significant issues at this time due to a prior configuration change in the CDN. Unfortunately, they didn’t detect the user impact until 11:30 (due to internal users on Slack complaining), whereas with ML they would have been alerted at 10:00 when the issue occurred. This analysis can be combined with to give operations teams early insights into changes in system behaviour. Use Case 2: Changes in Website Behaviour Once simple behaviours are analysed, next steps are often to analyse more complex features. For example, changes in event rates of the different returned by the webserver can often indicate changes in system behaviour or unusual clients: This use case is more complex as it involves analysing multiple series concurrently, but it can again be simply configured using the “” wizard: Results show some significant changes in the different response codes: In particular, again on February 27th there is a significant change in behaviour of response_code 404, 301, 306 and 200. Zooming in on 404s show some significant anomalies: The first highlighted anomaly is attributed to a specific IP address as nginx.access.remote_ip is defined as an influencer (more on this in a later blog). The second highlighted anomaly represents a significant overall change in 404 behaviour. The increase in 404s on February 27th was again a new insight for the operations team, and represented a large number of dead links that had been introduced by the configuration change. Use Case 3: Unusual Clients Website traffic generally consists of a combination of normal usage, scanning by bots and attempted malicious activity. Assuming the majority of clients are normal, we can use to detect significant attacks or bot activity. The number of pages a normal user requests in a 5-minute window can be limited by how fast they can manually click website pages. Automated processes can scan 1000s of pages a minute, and attackers can simply flood a site with requests. There are a number of we could use to differentiate traffic types, but in the first instance, event rate and number of distinct URL rate by a client can highlight unusual client activity. In this case, is used to configure 2 population jobs: Job 3.1: Detect unusual remote_ips - high request rates Looking at unusually high event rate for a client (nginx_access_remote_ip_high_count) we get: This shows a number of anomalous clients. For example, 185.78.31.85 seems to be anomalous over a long time period: Drilling into a dashboard that summarises this interaction: This shows that this IP address has repeatedly hit the root URL (/) an unusually large number of times in a short time period, and that this behaviour continues for several days. Job 3.2: Detect unusual remote_ips - high request rates Looking at unusually high distinct count of URL rate for a client (nginx_access_remote_ip_high_dc_url) we get: Again, this shows a number of unusual clients. Drilling into 72.57.0.53 shows a client accessing > 12000 distinct URLs in a short period. Drilling into a dashboard that summarises this interaction: This shows this client is attempting a large number of unusual URLs consistent with types of attack. Both these jobs provide real-time visibility into unusual clients accessing a website. Web traffic is often skewed by bots and attackers, and differentiating these clients can help administrators understand behaviours such as: Summary This blog attempts to show how X-Pack ML can provide insights into website behaviour. In upcoming Elastic Stack releases these types of configurations and dashboards will be available to end users as easily installed packages. This should empower users with proven tested configurations and also show users recommended types of configurations to copy and extend.
Kibana 5.6.2 released;;/blog/kibana-5-6-2-released;Court Ewing;September 26, 2017;Releases;; Kibana 5.6.2 is released today with a critical fix to the Upgrade Assistant in X-Pack basic as well as a fix for a metric visualization regression. The change to the Upgrade Assistant is critical for any users that started using Kibana prior to 5.5 and intend to upgrade to 6.0. The Upgrade Assistant helps users migrate their .kibana index to a single-type format that is compatible with Elasticsearch 6.0, but prior to version 5.6.2 it would not properly disable dynamic mappings for the .kibana index after migration, so saving an object in Kibana would result in an unexpected new type in the .kibana index, which would put Kibana into a permanent red status. If your Kibana install began in 5.5 or later, or if you haven’t run the upgrade assistant yet, just go ahead and upgrade to 5.6.2 to make sure this issue is fixed going forward. If your Kibana install predates 5.5, and you’ve already upgraded the .kibana index via the Upgrade Assistant in 5.6.0 or 5.6.1, then you must follow a manual process to fix the data corruption issue, which is documented in this , and you should upgrade to 5.6.2 as well. The regression with metric visualizations did not respect style properties that were set on existing metric visualizations. Upgrading to 5.6.2 should resolve this issue. Kibana 5.6.2 is available on our as well as on . As always, check out the .
Elasticsearch 5.6.2 released;Elasticsearch 5.6.2 released;/blog/elasticsearch-5-6-2-released;Clinton Gormley;September 26, 2017;Releases;; Today we are pleased to announce the release of , based on . Elastisearch 5.6.2 is the latest stable release, and is already available for deployment on , our Elasticsearch-as-a-service platform.Latest stable release in 5.x:Please , try it out, and let us know what you think on Twitter () or in our . You can report any problems on the .
Centralized Pipeline Management in Logstash;;/blog/logstash-centralized-pipeline-management;Suyog Rao;September 26, 2017;Engineering;; Introduction Motivation behind this feature Multi-tenancy and self-service Security considerations Roadmap Longer term, we are working on providing syntax highlighting, autocomplete, and snippets support for the config editor. Version based rollback, you ask? On it. Oh, and did I mention we’re already thinking about a drag-n-drop UI to create pipelines? I can go on and on about all the exciting stuff that's brewing in our heads, but I’ll stop here. :) We would like to hear from you! Please let us know how you like this feature or if you have ideas for enhancements. You could also win plenty of swag when you provide feedback via our . What are you waiting for?! Get ‘stashin!
Using Elastic Machine Learning to Automate Complex Data Analysis at Sunhotels;;/blog/using-elastic-machine-learning-to-automate-complex-data-analysis-at-sunhotels;Juan Cidade;September 21, 2017;User Stories;fr-fr; An Interview with Juan Cidade, Head of IT Ops, SunhotelsHow has the exponential growth of incoming data affected the way you do business?As an online business providing travel services to European businesses, Sunhotels has always been data-driven at its core. Consequently, we have always embraced open source technologies - particularly Elasticsearch - to help us gain insight from our customer data and improve our services as a result. Now, as part of Australian travel group Webjet, we’re focusing heavily on the use of next generation data technologies to drive ongoing innovation and growth. Recently, the volume of data we handle has grown exponentially. We process search and booking requests for thousands of travel agencies and B2B suppliers across Europe. This includes enquiries from tens of thousands of travel agencies, and a huge swathe from third party aggregators. In two years, the average number of requests per second has grown from 600 to over 4000. That’s 250 million requests and thousands of bookings per day, originating from a host of different sources. We needed a platform that could process this kind of volume of data at speed and scale but, importantly, one that would reveal meaningful insight. How have Elastic technologies helped you manage the data revolution?Elasticsearch has been a part of the business for many years. Originally, we wanted to have some idea about search response times, and to find a way to differentiate between kinds of searches and their outcomes. Soon, we wanted to capture different strands of metadata from each search: helping us to understand more about each interaction, and be more responsive to issues. We started logging around 15 different fields across each search and booking: response time, destination, customer type, product contract (third party or direct etc). When putting all the raw data into Elasticsearch, we not only see response times across the board at a granular level, but we can analyse and understand all the different factors that might affect availability in one hotel: low availability, seasonality, price sensitivity. Using this information, we work with individual hotels, destinations, providers or clients to suggest solutions: change pricing, improve contracting/mapping/availability, connecting/disconnecting providers. Working with SQL, it was hard to track what was being searched versus what was being booked. It was particularly hard to find blind spots in sales. Now, the wider business uses the same dashboards that operations use, taking the insight to inform long-term strategies. Being able to track ‘look to book’ ratios and analyse how many searches result in a booking, per destination, per client, allows Sunhotels to tune sales and contracting strategies: useful when onboarding new providers and clients. We can make educated, profile-based assumptions and deliver a service based on historical insight. This enables us to optimize traffic. More bookings with less traffic: win-win. How does this help Sunhotels with its ongoing strategy of innovation and growth?There are two main ways in which our data management infrastructure benefits both us and our partners. Firstly we want to detect changes in behaviour more quickly so we can respond proactively. When we see trends are appearing over time (e.g. January is a busy time for beach holiday booking in Scandinavia versus the Spanish habit of booking last minute), we’re able to predict behaviours and align the service accordingly. Predicting behaviour from historical insight means we can be proactive in offering our sales/contracting department the information they need to act swiftly, too. This might mean boosting the number of properties available in a certain area or at a certain price point, if we expect demand to be high. This could also mean increasing activities in quieter regions: improving bookings and warning partners about anticipated lulls or spikes. Secondly, by being able to crunch terabytes of data within Elasticsearch, we are a very lean organisation. The travel industry - particularly B2B - is very traditional, characterised by large teams doing much of the heavy analytical lifting. With Elasticsearch taking the strain out of processing volume data, our team remains focused on using the analysis to improve the service. We’re experiencing dramatic efficiencies in terms of processing time and man hours, as a result. What part will machine learning play in Sunhotel’s strategy?As the business looks to us to provide deeper insight, Elastic’s X-Pack, and its machine learning functionality, are enabling us to automate a huge layer of analysis, currently being handled by 12 bespoke-built robots, managed by a very small team of engineers. We can spend a lot of time on monitoring and tweaking parameters, simply because we have ten times more data, external connections, and incoming requests than we did a year ago. We had to find a way to automate complex analysis, at scale. If one of our customers does 150 bookings a day, our robots can look at some of the behaviours and anomalies associated with those transactions but, to make meaningful assumptions about the relationship between points, a complex coding process has to take place. Without machine learning, it’s very hard to cross analyse: select metrics from multiple indices. We also want the infrastructure to take seasonality into consideration automatically. We usually have to alter this manually across all robots, which is very time-consuming and technical. Machine learning can help us analyse against consistent rules, implemented automatically. With multiple clusters and multiple nodes we knew we weren’t talking about trivial money and so, after a period of investigation with a number of machine learning solutions, we were pleased to see machine learning included as part of X-Pack. Machine learning can easily be deployed within the existing Elastic ecosystem: no integration or the need for separate logins. Figure 1: Jobs related to bookings (number and value of cancellations, bookings per provider, booking errors etc.) Figure 2: Response time from external providers
Space Saving Improvements in Elasticsearch 6.0;;/blog/minimize-index-storage-size-elasticsearch-6-0;Jason Zucchetto;September 20, 2017;Engineering;; Elasticsearch 6.0 ships with two great improvements to help minimize index storage size. The best part about the improvements is they will require no special configuration changes or re-architectures, and in most cases will only require a simple upgrade and a newly created index. To illustrate the improvements, we’ll use , a lightweight tool for ingesting metrics into Elasticsearch. After running Metricbeat for several days on both Elasticsearch 5.6 and Elasticsearch 6.0 (6.0 beta2), index sizes were 41.5% smaller for the Metricbeat workload on Elasticsearch 6.0:GET _cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open metricbeat-2017.09.16 0b46voluSDmzfwCdYmYvZg 5 1 1694709 0 508.6mb 508.6mb yellow open metricbeat-2017.09.17 UKrTuwevS3urZkjeU8GFhg 5 1 1694385 0 500.7mb 500.7mb yellow open metricbeat-2017.09.18 dxFeMlabR_anYZ_C6BBq4A 5 1 1696223 0 512.7mb 512.7mb Total storage size over 3 days: GET _cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open metricbeat-2017.09.16 7IK6c1bfSQCaFAp1i3axUQ 5 1 1696571 0 299.1mb 299.1mb yellow open metricbeat-2017.09.17 CcBCgLdfRESXH0UaGV6YCA 5 1 1695385 0 295.4mb 295.4mb yellow open metricbeat-2017.09.18 sZfCXx8ZReGzLIsjSFO4hA 5 1 1697063 0 296.1mb 296.1mb Total storage size over 3 days: (41.5% improvement in storage space from Elasticsearch 5.6)The test above used Metricbeat with the module.Deprecated _all fieldThe “_all” field was deprecated in Elasticsearch 6.0, this is the first part in the explanation for the storage space savings we’re seeing. If you’re unfamiliar with the “_all” field, it’s a special field used to concatenate all values together, making it easy to search everything. The “_all” field made it easy to get started with Elasticsearch, however, the “_all” field uses a lot of additional storage space (especially as values are duplicated). PUT /user_profiles/profile/1 { userid : john123, first: John, middle: James, last: Smith, city: Alamo, state: California } The “_all” field for the document above now contains the terms [ “john123”, “john”, “james”, “smith”, “alamo”, “california” ].Using the “_all” field, it was easy to search across all fields, however, we’re now duplicating a lot of values, and using more storage space:GET /user_profiles/_search { query: { match: { _all: john123 john james smith alamo california } } } With the deprecation of the “_all” field in Elasticsearch 6.0, we save space on indexing duplicate data. And if “_all” field functionality is still needed, the parameter should still be used in the index mapping.We ran a follow-up test to look at the effects of disabling “_all” in relational to the storage improvements we saw. When isolated, the deprecation of “_all” accounted for almost 40% of our index saving improvements (the other 60% was due to the sparse field improvements we’ll visit next).Sparse Field ImprovementsElasticsearch 6.0 includes Lucene 7.0, which has a major storage benefit in how sparsely populated fields are stored (). Metricbeat, used in our test above, happens to use a lot of sparsely populated fields.If you recall, doc values (the columnar data store in Elasticsearch) have allowed us to escape the limitations of JVM heap size to support scalable analytics on larger amounts of data. Doc values are a very good fit for dense values, where every document has a value for every field. But they have been a poor fit for sparse values (many fields, with few documents having a value for each field), where the matrix structure ends up wasting a lot of space. If you’re unsure of what a sparsely populate field is, it’s a field that contains a value for a small percentage of documents. For instance, if we go back to our user profile example, the “middle”, “city”, and “state” fields may only be present in a few of the documents: Internally, Lucene 7.0 (which ships with Elasticsearch 6.0) retrieves data within doc values as an iterator, which allows for more efficient data storage behind the scenes (especially for sparsely populated values): As you can see from the diagram above, with random access retrieval for doc values (present in Elasticsearch versions prior to 6.0), blank/empty values must be maintained (and extra storage space used). The iterator based approach does not need to store the empty/blank values, and the extra storage space used by the random access method can be reclaimed.In addition to storage improvements, the iterator-based retrieval method for sparsely populated fields has a number of performance improvements, including indexing speed. Sparse fields should still be avoided when they can be, as dense fields are the most efficient. More details can be found in .Both of these space saving improvements are present in Elasticsearch 6.0. We hope you’ll be happy with the new improvements, and look forward to hearing your feedback! If you want to try out Elasticsearch 6.0 for yourself, to determine any index storage size improvements, download and !
Generating an Elastic Cloud Enterprise Client;;/blog/generating-an-elastic-cloud-enterprise-client;Greg Marzouka;September 20, 2017;Engineering;; At Elastic, we use the OpenAPI specification, formerly known as Swagger, for documenting the Elastic Cloud Enterprise REST API. The Elastic Cloud Enterprise (ECE) API allows for creating and managing clusters, performing upgrades and repairs, and other general automation tasks within ECE. See our previous blog post, , for a great overview of the API. What is the ? From the OpenAPI website: The OpenAPI Specification (OAS) defines a standard, language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection. Adopting the OpenAPI specification allows us to generate our , providing a concise set of documentation: even more importantly, it empowers users to generate REST clients in the language of their choice. This ability to generate REST clients is incredibly useful for automating and developing your own software layer around Elastic Cloud Enterprise. In fact, we use our OpenAPI specification to generate many of the internal tools we use to manage Elastic Cloud. In this blog post, we're going to focus on client generation and how you can leverage our specification to generate a client in whatever supported language you desire. Getting the specificationIf you have a running ECE installation, the OpenAPI specification can be retrieved through the API by issuing a GET request to the following endpoint: on the coordinator host: $ curl -XGET https://ece-host:12443/api/v1/api-docs/swagger.json Otherwise, you can download the specification . Just change the version portion of the URL to the desired version of ECE. Inspecting the specification JSON a bit, you'll notice that each API endpoint is represented as an object. This object contains: For instance, the specification for shutting down an Elasticsearch cluster that’s running in ECE looks like this: /clusters/elasticsearch/{cluster_id}/_shutdown: { post: { security: [{ basicAuth: [] }], description: Shuts down a running cluster and removes all nodes belonging to the cluster. The plan for the cluster is retained. Warning: this will lose all cluster data that is not saved in a snapshot repository., x-doc: { tag: Clusters - Elasticsearch - Commands }, tags: [ClustersElasticsearch], operationId: shutdown-es-cluster, parameters: [{ name: cluster_id, in: path, description: Identifier for the Elasticsearch cluster, type: string, required: true }], summary: Shut down cluster, responses: { 202: { description: The shutdown command was issued successfully, use the \GET\ command on the /{cluster_id} resource to monitor progress, schema: { $ref: #/definitions/EmptyResponse } }, 404: { description: The cluster specified by {cluster_id} cannot be found (code: 'clusters.cluster_not_found'), schema: { $ref: #/definitions/BasicFailedReply } }, 449: { description: When running as an administrator (other than root), sudo is required (code: 'root.needs_sudo'), schema: { $ref: #/definitions/BasicFailedReply } } } } } Also, note that the Swagger version is available in the JSON definition. At the time of this blog post, we’re on version 2.0 of the specification: swagger: 2.0 Swagger Editor The Swagger website provides a great online tool called which parses the specification and produces friendly documentation, which you can navigate through (similar to our API reference). Now that you have the spec, go ahead and try pasting it into the editor. Hopefully, the benefits of using the OpenAPI specification will become obvious! Generating a clientGenerating a client is actually very straightforward. The Swagger Editor gives you the ability to generate clients in many different languages with just the click of a button. This is great for experimentation. However, in practice you'll likely want to a use a library-based or command-line driven code generator for automating, configuring, and even customizing the generation of your client. Introducing swagger-codegen The official Swagger code generator is Swagger Codegen and is used in this blog post. Depending on your language of choice however, there may be other third-party implementations ( for example, which is a very popular golang implementation). Swagger Codegen is entirely Java-based, but it supports generating clients in many different languages. As an example, we're going to generate both a Java and a Python client. The first step is to obtain Swagger Codegen. Following the , there are a few ways to go about doing so, but we'll try to stay as platform-agnostic as possible and will simply download and use the JAR file. You'll need Java >= 7 installed on your system. Running swagger-codegenOnce you've downloaded the JAR file, you can generate a client with just a few commands. First let's create a directory that will hold the generated client source code. This is not required, but desirable since by default swagger-codegen will spit out the files within the same directory that the command is ran from. mkdir java-ece-client Next, we'll invoke swagger-codegen to actually generate the client, pointing it at our newly created directory: mkdir java-ece-client java -jar swagger-codegen-cli-2.2.1.jar generate \ -i https://ece-host:12443/api/v1/api-docs/swagger.json \ -l java \ -o java-ece-client That's really all there is to it! You've just generated a Java client for Elastic Cloud Enterprise. If you into the directory, you'll see all of the supporting source code files complete with unit tests. Let's take a closer look at the options we specified when we invoked swagger-codegen: : The location of our Elastic Cloud Enterprise OpenAPI specification. This can be a local file, or you can point it directly to the Swagger endpoint in your running ECE installation. : The language we want to generate the client in, hence java to generate a Java client. : The directory in which to place the generated source files. Generating a client in a different language is as simple as changing the argument to the -l option. For instance, to generate a Python client instead, just run: mkdir python-ece-client java -jar swagger-codegen-cli-2.2.1.jar generate \ -i https://ece-host:12443/api/v1/api-docs/swagger.json \ -l python \ -o python-ece-client Check out the for the full list of supported languages. A few usage examplesNow that we have a generated client, let's take a look at a few examples of how we can use it. In our previous blog post, , we showed how to use the REST API directly (via curl) to retrieve and create Elasticsearch clusters. Let's mimic these examples, but this time using the client. We're going to use the Java client, but these examples should be semantically equivalent to any other language client generated by Swagger Codegen. Initializing the client The first step is to initialize an instance with the URL and credentials to our Elastic Cloud Enterprise API: ApiClient client = new ApiClient(): client.setBasePath(https://ece-host:12443/api/v1): client.addDefaultHeader(Authorization, “Basic “ + Base64.getEncoder().encodeToString(user:pass.getBytes(StandardCharsets.UTF_8)): ): The API uses basic authentication, so our credentials here are simply just our ECE username and password, base64-encoded. Retrieving Elasticsearch clustersNow that we have an instance, we can use the class to start interacting with Elasticsearch clusters. A curl request for retrieving all clusters in Elastic Cloud Enterprise, complete with all of its parameters, looks like the following: curl -k -X GET -u user:pass https://ece-host:12443/api/v1/clusters/elasticsearch?from=0&size=10&show_security=false&show_metadata=true&show_plans=false&show_plan_defaults=true&show_system_alerts=0&show_hidden=false We can execute the same request using the client as follows: ClustersElasticsearchApi elasticsearchApi = new ClustersElasticsearchApi(client): try { ElasticsearchClustersInfo response = elasticsearchApi.getEsClusters(0, 10, false, true, false, false, 0, false): for(ElasticsearchClusterInfo cluster : response.getElasticsearchClusters()) { System.out.println(cluster.getClusterName()): } } catch(ApiException ex) { System.out.println(Oops! + ex.getMessage()): } Note the strongly-typed request and response classes. No need to fiddle with JSON strings or maps! Creating an Elasticsearch cluster Next, let’s create an Elasticsearch cluster named “My First Cluster”, just as we did in the previous blog. Using curl, this would look like: curl -k -X POST -u user:pass https://ece-host:12443/api/v1/clusters/elasticsearch -H 'content-type: application/json' -d '{ cluster_name: My First Cluster, plan: { zone_count: 1, cluster_topology: [{ memory_per_node: 1024, node_count_per_zone: 1 }], elasticsearch: { version: 5.5.2 } } }' And now the Java API equivalent: ElasticsearchClusterTopologyElement topology = new ElasticsearchClusterTopologyElement(): topology.setMemoryPerNode(1024): topology.setNodeCountPerZone(1): ElasticsearchConfiguration configuration = new ElasticsearchConfiguration(): configuration.setVersion(5.5.2): ElasticsearchClusterPlan plan = new ElasticsearchClusterPlan(): plan.setZoneCount(1): plan.setElasticsearch(configuration): plan.setClusterTopology(Arrays.asList(topology)): CreateElasticsearchClusterRequest request = new CreateElasticsearchClusterRequest(): request.setClusterName(My First Cluster): request.setPlan(plan): try { ClusterCrudResponse response = elasticsearchApi.createEsCluster(request, false): System.out.println(response.getElasticsearchClusterId()): } catch (ApiException ex) { System.out.println(Oops! + ex.getMessage()): } Known issuesWhen generating a client using the official Swagger Codegen distribution, we've encountered several issues with different languages. For example, for Scala we found the following issues, which we maintain a for: Depending on your language of choice, it is possible that you may run into similar issues. We recommend that you open an issue in the official -- they have been very responsive to pull requests and bug reports from our experience. Conclusion To summarize, we covered how to obtain the Elastic Cloud Enterprise OpenAPI specification and how to generate a REST client from it using Swagger Codegen. We also mentioned some issues you may run into with using the official Swagger Codegen distribution. Also, depending on the language you wish to generate a client for, there may be other community third-party generators available. Since OpenAPI is just a specification that our API adheres to, you can even write your own generator!
Pods, Tokens, and a Little Glue: Integrating Kubernetes and Vault in Elastic Infrastructure;Pods, Tokens, and a Little Glue: Integrating Kubernetes and Vault on the Elastic DevOps Team;/blog/kubernetes-vault-integration-devops-team;Tyler Langlois;September 19, 2017;Engineering;; On Elastic's Infrastructure team, we are always looking for opportunities to introduce automation and reduce operational burdens. Function-as-a-service solutions such as AWS Lambda have allowed us to simplify operations for some services, but applications that require persistent runtimes have different challenges. There are many potential solutions to this problem: fleets of autoscaled AWS ECS instances, platforms such as Google App Engine or AWS Elastic Beanstalk, and more. Our work on this has led to investigating various container schedulers - and with such a large community and broad feature set, has proven a useful platform for our use case.Adopting a solution as powerful and flexible as Kubernetes entails some non-trivial work to bring it into an existing operations workflow, not the least of which is providing secrets or otherwise sensitive information to running applications within a or . This post will explain our approach to integrating (our chosen secret management solution) with Kubernetes.IntroductionIntegrating Kubernetes and Vault should meet a few requirements: Moreover, applications should be able to migrate to Kubernetes without dramatic changes in order to consume secrets - this is critical to aid in migration from traditional platforms like AWS EC2 to Kubernetes.BackgroundBefore diving into the implementation, there are a few details specific to our solution that are worth highlighting: ImplementationThere are a few steps that come together to achieve the goals outlined in the introduction: first, connecting Kubernetes to Vault: second, providing a mechanism to expose secrets to running applications: and third, creating reusable tools to aid in generic solutions for additional applications.Connecting Kubernetes and VaultFortunately, the project provides a well-designed solution to this problem. ensures that tokens are passed to applications securely in-transit, meeting the first requirement for integrating with Vault.Our deployment of kubernetes-vault looks similar to the one outlined in the project's quick start guide, with the exception that Vault is run of Kubernetes as an independent service. This works equally well as running Vault within Kubernetes itself, which is important as a variety of services outside of our Kubernetes cluster also consumes Vault for different use cases.One important consideration when deploying the kubernetes-vault deployment is to ensure that the token passed to the service is . When interacting with Vault as a limited-privilege user, any generated tokens are subject to expiry when their parent token is revoked or expired.Exposing Secrets to ApplicationsLeveraging kubernetes-vault brings us to the point that a JSON-formatted file is available with several values including the Vault token. While consuming this file can be done with a little code for applications, retrofitting existing applications can be challenging.Fortunately, many applications can easily consume environment variables, which provides a convenient way to pass secret values without needing to write them out to a persistent file or in a Pod/Deployment spec. Another tool called can automate this process.To illustrate how this works, consider a for Logstash (which can ). How can we pass an HTTP basic authentication username and password to the to securely index events to an external, secured cluster?First, we define a defining where to reference secrets from within Vault:apiVersion: v1 kind: ConfigMap metadata: name: logstash-secrets data: logstash.secrets: | ELASTICSEARCH_USERNAME=elasticsearch/production#username ELASTICSEARCH_PASSWORD=elasticsearch/production#passwordThis instructs to expose a secret stored in the under the path with the key as the environment variable . This is mounted into the pod at as a de facto directory for vaultenv secrets.In a similar fashion, we can define our Logstash config as a , interpolating the named variables into the configuration file directly:apiVersion: v1 kind: ConfigMap metadata: name: logstsah-config data: example.conf: | input { exec { command => date interval => 5 } } output { elasticsearch { index => logstash-%{+YYYY} hosts => [https://elasticsearch.cluster.url:9200] user => ${ELASTICSEARCH_USERNAME} password => ${ELASTICSEARCH_PASSWORD} ssl => true } }Ensuring that the executable is present in the container can be done in a with some simple commands or in a multi-stage to compile locally and it into the final container image. Additionally, installing makes consuming the file easier.Finally, the container's command can be defined. In the case of , the default is , so we simply wrap that script within a invocation:token=$(jq -r '.clientToken' /var/run/secrets/boostport.com/vault-token) exec vaultenv \ --host $VAULT_ADDR \ --token $token \ --secrets-file /etc/secrets.d/logstash.secrets \ /usr/local/bin/docker-entrypointCoupled with an as explained in the kubernetes-vault documentation, will retrieve the configured secrets and invoke with the and environment variables.This pattern permits the token to be securely passed to , which lets us modify the original application container minimally to invoke its same command wrapped in 's provided environment variables.Managing Periodic TokensLike the token used for kubernetes-vault, defining a TTL for tokens issued to an is a best practice to ensure that access is revoked regularly for deleted pods and old tokens. While only executes once before passing control to its command argument, if the pod's container should die for any reason, the container will be re-run, and the token may have expired in the intervening period. Fortunately, because the token JSON file is managed as an , we can consume the token from a sidecar container to renew it out-of-band from the pod's main container.In our environment, we host a small token renewal image on our Docker registry that has the simple task of reading the token JSON file and renewing it regularly. This permits us to re-use the image in definitions that require it, without needing to change the application container in any way.As an example, the python daemon's renewal logic could include something as simple as this (using the library):while True: lookup = vault.lookup_token()['data'] interval = lookup['creation_ttl'] / 2 if lookup['ttl'] <= interval: print('renewing vault token') vault.renew_token() sleep(10)This loop simply renews the Vault token if it has less than half its TTL left, checking every ten seconds (the actual logic we use is slightly different, but this is simpler to illustrate). Connection exceptions to Vault are also important to consider.Adding this automatic renewal to an existing becomes as simple as defining an additional container to the Kubernetes yaml configuration:- name: token-renewer image: private.registry.url/token-renewer volumeMounts: - name: vault-token mountPath: /var/run/secrets/boostport.comIf the main container is restarted for any reason, will have a still-valid token to request secrets again to invoke the container command with newly requested secrets.An Example Kubernetes DefinitionYaml configuration files for kubernetes-vault and related specifications are likely deployment-specific, but . Some specific items of note in this configuration file: With the necessary yaml additions, many specifications can re-use this pattern to securely reference Vault secrets with limited policies to ensure that is followed.Final ThoughtsThis example illustrates how to create applications that are easy to scale and manage thanks to Kubernetes, and can safely consume secrets from a highly secure store such as Vault.In practice, this has proven to be a useful system design for our use case. In particular, using Vault as a cloud-agnostic secret store has permitted us to run Kubernetes in any arbitrary environment and consume Vault APIs regardless of where kubernetes-vault is running, whether in AWS, GCP, or otherwise. Relying on kubernetes-vault's response wrapping to securely deposit a token in a container alleviates some of the tension regarding how to distribute initial access credentials, and vaultenv has allowed us to provide an application-agnostic way of exposing those fetched secrets to applications that need them.: As of today, . While the approach laid out in this post obviously predates this technique, some concepts in this post (such as leveraging vaultenv for services that do not natively consume Vault tokens) are still useful to consider.