{"id":2043,"date":"2019-02-15T14:19:00","date_gmt":"2019-02-15T14:19:00","guid":{"rendered":"https:\/\/dev.hypersense-software.com\/blog\/?p=2043"},"modified":"2024-09-12T14:29:50","modified_gmt":"2024-09-12T11:29:50","slug":"aws-athena-parsing-apache-nginx-logs","status":"publish","type":"post","link":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/","title":{"rendered":"AWS Athena &#8211; Parsing apache, nginx and AWS ELB access logs"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-if-something-can-go-wrong-it-probably-will\"><strong>If something can go wrong, it probably will<\/strong><\/h2>\n\n\n\n<p>We\u2019ve encountered a problem on one of our servers. For obvious reasons, let\u2019s name the main server, \u201cX\u201d. At one time, the systems\u2019 performance and status needed to be checked, because some of the calls responded unexpectedly. We had many logs, \u201cmany\u201d being an understatement, since X relied on communicating with other servers Y and Z. So we had a collection of servers for different functionalities, each having its own logs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-needle-in-a-haystack\"><strong>Needle in a haystack<\/strong><\/h2>\n\n\n\n<p>Exporting all the logs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>apache access logs<\/li>\n\n\n\n<li>apache error logs<\/li>\n\n\n\n<li>nginx logs<\/li>\n<\/ul>\n\n\n\n<p>What do programmers do when faced with a lot of data? They use patterns, in more geek-ish terms:&nbsp;<strong>Regex.<\/strong><\/p>\n\n\n\n<p>If you\u2019re not familiar with the term, go here:&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/en.wikipedia.org\/wiki\/Regular_expression<\/a>&nbsp;and then come back. If you\u2019re wondering if you need them, remember you have used them before to check if a string contains a string in SQL by using \u201clike\u201d, when you explode a string, etc.<\/p>\n\n\n\n<p>Basically, we\u2019ll use a Regex to look for a given pattern in a string, where the string is a single output log.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-all-roads-lead-to-rome\"><strong>All roads lead to Rome<\/strong><\/h2>\n\n\n\n<p>Before we get started, this will take a while, so if you\u2019re in a hurry, skip to the last chapter and try the already build solution, in case it\u2019s what you need. Then again, if you continue reading, you might learn more for the future.<\/p>\n\n\n\n<p>We can parse strings in many languages, but most often, we select the one most suitable for us.<\/p>\n\n\n\n<p>So we can analyse logs in: C, Java, PHP, Swift\u2026. you get the point.<\/p>\n\n\n\n<p>But since we\u2019re on AWS, the most suitable for us would be Athena, provided by \u2026 AWS.<\/p>\n\n\n<div class=\"post-cta\"><div><div><p class=\"blog-cta-title\">Your Path to Innovation Starts with Digital Transformation<\/p><p>Customized Digital Strategies for Competitive Advantage<\/p><a href=\"https:\/\/hypersense-software.com\/services\/digital-transformation\">Explore Digital Transformation<\/a><\/div><\/div><\/div>\n\n\n\n<p>So now we have a sample on how to parse:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs (\nrequest_timestamp string,\nelb_name string,\nrequest_ip string,\nrequest_port int,\nbackend_ip string,\nbackend_port int,\nrequest_processing_time double,\nbackend_processing_time double,\nclient_response_time double,\nelb_response_code string,\nbackend_response_code string,\nreceived_bytes bigint,\nsent_bytes bigint,\nrequest_verb string,\nurl string,\nprotocol string,\nuser_agent string,\nssl_cipher string,\nssl_protocol string\n)\nROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'\nWITH SERDEPROPERTIES (\n'serialization.format' = '1',\n'input.regex' = '(&#091;^ ]*) (&#091;^ ]*) (&#091;^ ]*):(&#091;0-9]*) (&#091;^ ]*)&#091;:\\-](&#091;0-9]*) (&#091;-.0-9]*) (&#091;-.0-9]*) (&#091;-.0-9]*) (|&#091;-0-9]*) (-|&#091;-0-9]*) (&#091;-0-9]*) (&#091;-0-9]*) \\\\\\\"(&#091;^ ]*) (&#091;^ ]*) (- |&#091;^ ]*)\\\\\\\" (\\\"&#091;^\\\"]*\\\") (&#091;A-Z0-9-]+) (&#091;A-Za-z0-9.-]*)$' )\nLOCATION 's3:\/\/your_log_bucket\/prefix\/AWSLogs\/AWS_account_ID\/elasticloadbalancing\/';\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-change-bucket-and-press-run\"><strong>Change bucket and press run?<\/strong><\/h2>\n\n\n\n<p>If that worked you wouldn\u2019t be here. But you are, so what went wrong? Either you didn\u2019t get any data, or the table\u2019s structure isn\u2019t exactly what you want. Either way blame Regex.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>(&#091;^ ]*) (&#091;^ ]*) (&#091;^ ]*):(&#091;0-9]*) (&#091;^ ]*)&#091;:\\-](&#091;0-9]*) (&#091;-.0-9]*) (&#091;-.0-9]*) (&#091;-.0-9]*) (|&#091;-0-9]*) (-|&#091;-0-9]*) (&#091;-0-9]*) (&#091;-0-9]*) \\\\\\\"(&#091;^ ]*) (&#091;^ ]*) (- |&#091;^ ]*)\\\\\\\" (\\\"&#091;^\\\"]*\\\") (&#091;A-Z0-9-]+) (&#091;A-Za-z0-9.-]*)$\n<\/code><\/pre>\n\n\n\n<p>What does that mean? First of all, your logs might use a different format, so the parser didn\u2019t find what it was looking for. Let\u2019s build our own parser.<\/p>\n\n\n\n<p>First, you should open&nbsp;<a href=\"https:\/\/regex101.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/regex101.com<\/a>&nbsp;(there are other options, but like this one) in a new tab and put it on another window.<\/p>\n\n\n\n<p>Take a log(single line) and use it as a test string. It will make your life easier if you know how the logs were written, but it\u2019s ok if you don\u2019t.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-step-by-step-goes-the-algorithm\"><strong>Step by step, goes the algorithm<\/strong><\/h2>\n\n\n\n<p>Assuming you can\u2019t get it all in your first attempt, we\u2019ll breakdown the log on a step by step basis.<\/p>\n\n\n<div class=\"post-cta\"><div><div><p class=\"blog-cta-title\">Tailored Mobile Solutions for Your Unique Needs<\/p><p>Redefine Your Business with Mobile Application Development<\/p><a href=\"https:\/\/hypersense-software.com\/services\/mobile-app-development\">Explore Mobile App Development<\/a><\/div><\/div><\/div>\n\n\n\n<p>Basic things to know:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201c-\u201d will appear when a field is empty<\/li>\n\n\n\n<li>AWS adds a date at the start in ISO format<\/li>\n\n\n\n<li>date might appear twice<\/li>\n\n\n\n<li>\u201c, ], [, ),*,+ ( are reserved for Regex and need to be escaped, if you need them use \u201c\\\u201d to escape them<\/li>\n\n\n\n<li>([^ ]*)means any character until you hit a space<\/li>\n\n\n\n<li>([.0-9]*) works great for double and IPs \u2013 unless logs are print numbers using \u201c,\u201d, if so, replace \u201c.\u201d with \u201c,\u201d (DON\u2019T USE , for IP)<\/li>\n\n\n\n<li>([^ ]*)T([^ ]*)Z parses iso date<\/li>\n\n\n\n<li>([^]*) ignore all except , we use it to ignore bits of string we don\u2019t need, we did mention date appears twice \u2013 so we don\u2019t need it<\/li>\n\n\n\n<li>([^\\n]*) will read the rest of the log<\/li>\n\n\n\n<li>\u2018serialization.format\u2019= \u20181\u2019 is internal for AWS and it tells it the logs are archived<\/li>\n\n\n\n<li>you can pass a folder in the bucket and it will check subfolders (since logs are exports, use an auto generated string)<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL TABLE IF NOT EXISTS webserver_proxy_access_logs (\nrequest_date string,\nrequest_time string,\nall_else string\n)\nROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'\nWITH SERDEPROPERTIES (\n'serialization.format' = '1',\n'input.regex' = '(&#091;^ ]*)T(&#091;^ ]*)Z (&#091;^\\n]*)' )\nLOCATION 's3:\/\/your_log_bucket\/prefix\/AWSLogs\/AWS_account_ID\/elasticloadbalancing\/'\n<\/code><\/pre>\n\n\n\n<p>This will work on all AWS logs, and can be used as it\u2019s written above, the resulting table will support all sql. Store it (Cmd+D) until AWS decides to change something.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-are-we-there-yet\"><strong>Are we there yet?<\/strong><\/h2>\n\n\n\n<p>If you keep reading it means you have time and no one is acting like a 5 year old asking if we\u2019re there yet.<\/p>\n\n\n\n<p>Start with a small step, write the regex for the date and time section:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>(&#091;^ ]*)T(&#091;^ ]*)Z (&#091;^\\n]*)<\/code><\/pre>\n\n\n\n<p>This looks for 3 fields:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the first is delimited by \u201d \u201d or \u201cT\u201d<\/li>\n\n\n\n<li>the second is delimited by \u201d \u201d or \u201cZ\u201d<\/li>\n\n\n\n<li>the last is everything left after the space after \u201cZ \u201c<\/li>\n<\/ul>\n\n\n\n<p>Example applied to an access log:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>2018-01-18<\/strong>T<strong>10:12:15.776<\/strong>Z&nbsp;<strong>196.233.218.12 \u2013 \u2013 &#091;18\/Jan\/2018:10:12:14 +0000] \u201cGET \/p1=value HTTP\/1.1\u201d 200 15804 \u201c-\u201d \u201cMozilla\/5.0 (Linux; Android 6.0; CRO-L22 Build\/HUAWEICRO-L22) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/52.0.2743.98 Mobile Safari\/537.36\u201d \u201c-\u201d \u201cupstream: 127.0.0.106:80\u201d 0.203 0.203<\/strong><\/code><\/pre>\n\n\n\n<p>Assuming we need the IP, for \u2026 isolating calls from a given IP to check spam\u2026. or any other reason.<\/p>\n\n\n\n<p>Above, we gave a regex for IP:&nbsp;<strong>([.0-9]*)&nbsp;<\/strong>or use the general one&nbsp;<strong>([^ ]*)<\/strong><\/p>\n\n\n\n<p>By the way:<strong>&nbsp;([.:0-9]*)&nbsp;<\/strong>can be used for IP:Port\u2026. I\u2019ll attach a table, unless a 5 year old pops up next to me.<\/p>\n\n\n\n<p>So the regex will be:&nbsp;<strong>([^ ]*)T([^ ]*)Z ([.0-9]*) ([^\\n]*)<\/strong><\/p>\n\n\n\n<p>Go back to the site&nbsp;<a href=\"https:\/\/regex101.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/regex101.com\/<\/a>&nbsp;and check the breakdown, a new group means a new column in the table&nbsp;<strong>request_IP string.<\/strong><\/p>\n\n\n\n<p>Let\u2019s ignore the date, AKA skip all until \u201c<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>([^&lt;somechar&gt;]*)&nbsp;<\/strong>becomes<strong>&nbsp;([^\u201d]*)<\/strong><\/li>\n<\/ul>\n\n\n\n<p>New regex is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>(&#091;^ ]*)T(&#091;^ ]*)Z (&#091;.0-9]*) (&#091;^\"]*) (&#091;^\\n]*)<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-do-while\"><strong>do&#8230;while<\/strong><\/h2>\n\n\n\n<p>So, in short:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>add a new group<\/li>\n\n\n\n<li>add the column<\/li>\n\n\n\n<li>test at&nbsp;<a href=\"https:\/\/regex101.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/regex101.com\/<\/a><\/li>\n\n\n\n<li>run the create table statement on the logs<\/li>\n\n\n\n<li>check the results<\/li>\n\n\n\n<li>repeat<\/li>\n<\/ul>\n\n\n\n<p>When you have added everything, you can remove the&nbsp;<strong>([^\\n]*)&nbsp;<\/strong>and the<strong>&nbsp;all_else&nbsp;<\/strong>field. If you\u2019re not sure, leave them in, but make sure to remove the space before the group in the regex.<\/p>\n\n\n\n<p>If the regex ends with&nbsp;<strong>([^ ]*) ([^\\n]*)&nbsp;<\/strong>change it to<strong>&nbsp;([^ ]*)([^\\n]*)<\/strong><\/p>\n\n\n\n<p>Last step: count the number of items that have the request_date null. These will be the failed attempts to parse that we ended up with. 200 out of 90 mil can be considered an acceptable error.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-it-s-peaceful-here\"><strong>It&#8217;s peaceful here<\/strong><\/h2>\n\n\n\n<p>As promised, here is what we ended up with&nbsp;<a href=\"https:\/\/github.com\/HyperSense-Software\/aws-athena-apache-nginx-access-logs\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/github.com\/HyperSense-Software\/aws-athena-apache-nginx-access-logs<\/a>.&nbsp;<\/p>\n\n\n\n<p>Here are the patterns you might need:<\/p>\n\n\n\n<p><strong>([^&lt;somechar&gt;]*)&nbsp;<\/strong>\u2013 string from current index until the first &lt;somechar&gt;, the next one is an application<\/p>\n\n\n\n<p><strong>(&lt;some_regex&gt;){0,1}&nbsp;<\/strong>\u2013 extracts regex, but only if it exists \u2013 useful for multiple formats<\/p>\n\n\n\n<p><strong>([.]*)<\/strong>\u2013 any string from current index until the end<\/p>\n\n\n\n<p><strong>([^ ]*)&nbsp;<\/strong>\u2013 string from current index until the first space<\/p>\n\n\n\n<p><strong>([^\\n]*)<\/strong>&nbsp;\u2013 string from current index until the first new line, used to build step by step<\/p>\n\n\n\n<p><strong>([^ ]*)T([^ ]*)Z&nbsp;<\/strong>&nbsp;\u2013 extracts date and time, 2 fields<\/p>\n\n\n\n<p><strong>([:.0-9]*)&nbsp;<\/strong>\u2013 extracts IP(with\/without port)<\/p>\n\n\n\n<p><strong>([.0-9]*):([0-9]*)&nbsp;<\/strong>\u2013 extracts IP and port, 2 fields<\/p>\n\n\n\n<p><strong>([.0-9]*)<\/strong>\u2013 extracts IP or double<\/p>\n\n\n\n<p><strong>([0-9]*)<\/strong>\u2013 extracts int<\/p>\n\n\n\n<p><strong>(\\\u201d[^\\\u201d]*\\\u201d)<\/strong>\u2013 string delimited by \u201c<\/p>\n\n\n\n<p><strong>\\\u201d([^\\\u201d]*)\\\u201d<\/strong>\u2013 string delimited by \u201c, but won\u2019t take \u201d as part of the field<\/p>\n\n\n\n<p><strong>\\\u201d([^ ]*)([^\\\u201d]*)\\\u201d \u2013&nbsp;<\/strong>string delimited by \u201d and separates the first word (use if to extract the request method when logs contain \u201cGET \/p1=v1\u201d)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-more-more-more\"><strong>More more more<\/strong><\/h2>\n\n\n\n<p>There are more regexes that can be used. If you have problems with the regex, you can use more SQL when making queries.<\/p>\n\n\n\n<p>To download the queries used in this article please visit our&nbsp;<a href=\"https:\/\/github.com\/HyperSense-Software\/aws-athena-apache-nginx-access-logs\" rel=\"noreferrer noopener\" target=\"_blank\">aws-athena-apache-nginx-access-logs repository<\/a>&nbsp;on Github.<\/p>\n\n\n\n<p>If you found a pattern you needed and wasn\u2019t written above, leave a comment and we\u2019ll add it shortly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If something can go wrong, it probably will We\u2019ve encountered a problem on one of our servers. For obvious reasons, let\u2019s name the main server, \u201cX\u201d. At one time, the systems\u2019 performance and status needed to be checked, because some&hellip;<\/p>\n","protected":false},"author":2,"featured_media":2044,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[33,217],"tags":[],"class_list":["post-2043","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-serverless-computing","category-web-development"],"featured_image_src":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","author_info":{"display_name":"Andrei Neacsu","author_link":"https:\/\/hypersense-software.com\/blog\/author\/andrei-neacsu\/"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.7 (Yoast SEO v26.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AWS Athena - Parsing apache, nginx and AWS ELB access logs<\/title>\n<meta name=\"description\" content=\"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AWS Athena - Parsing apache, nginx and AWS ELB access logs\" \/>\n<meta property=\"og:description\" content=\"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries\" \/>\n<meta property=\"og:url\" content=\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\" \/>\n<meta property=\"og:site_name\" content=\"HyperSense Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/hypersense.software\" \/>\n<meta property=\"article:published_time\" content=\"2019-02-15T14:19:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-12T11:29:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1360\" \/>\n\t<meta property=\"og:image:height\" content=\"766\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrei Neacsu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@HyperSenseSoft\" \/>\n<meta name=\"twitter:site\" content=\"@HyperSenseSoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrei Neacsu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\"},\"author\":{\"name\":\"Andrei Neacsu\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/ab8c2a667674a1b3926d6b1f0685ab3c\"},\"headline\":\"AWS Athena &#8211; Parsing apache, nginx and AWS ELB access logs\",\"datePublished\":\"2019-02-15T14:19:00+00:00\",\"dateModified\":\"2024-09-12T11:29:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\"},\"wordCount\":1258,\"publisher\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg\",\"articleSection\":[\"Cloud &amp; Serverless Computing\",\"Web Development\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\",\"url\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\",\"name\":\"AWS Athena - Parsing apache, nginx and AWS ELB access logs\",\"isPartOf\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg\",\"datePublished\":\"2019-02-15T14:19:00+00:00\",\"dateModified\":\"2024-09-12T11:29:50+00:00\",\"description\":\"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries\",\"breadcrumb\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage\",\"url\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg\",\"contentUrl\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg\",\"width\":1360,\"height\":766,\"caption\":\"hypersense aws athena\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/hypersense-software.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AWS Athena &#8211; Parsing apache, nginx and AWS ELB access logs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#website\",\"url\":\"https:\/\/hypersense-software.com\/blog\/\",\"name\":\"HyperSense Blog\",\"description\":\"Latest software development trends and insights\",\"publisher\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/hypersense-software.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#organization\",\"name\":\"HyperSense Software\",\"url\":\"https:\/\/hypersense-software.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/logo-hypersense-512.svg\",\"contentUrl\":\"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/logo-hypersense-512.svg\",\"width\":64,\"height\":64,\"caption\":\"HyperSense Software\"},\"image\":{\"@id\":\"https:\/\/hypersense-software.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/hypersense.software\",\"https:\/\/x.com\/HyperSenseSoft\",\"https:\/\/www.instagram.com\/hypersensesoftware\/\",\"https:\/\/ro.pinterest.com\/HyperSenseSoft\/\",\"https:\/\/www.linkedin.com\/company\/hypersense-software\/\",\"https:\/\/www.behance.net\/hypersense\",\"https:\/\/www.youtube.com\/@hypersensesoftware\",\"https:\/\/github.com\/HyperSense-Software\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/ab8c2a667674a1b3926d6b1f0685ab3c\",\"name\":\"Andrei Neacsu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/3dedf5440207d67bade8089703be1d2424d9d03a74e060a0cac6c7e1d24b5009?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/3dedf5440207d67bade8089703be1d2424d9d03a74e060a0cac6c7e1d24b5009?s=96&d=mm&r=g\",\"caption\":\"Andrei Neacsu\"},\"description\":\"Andrei, CTO and co-founder of HyperSense Software Inc., has an extensive career spanning over 15 years in the tech industry. With hands-on experience in mobile and web development, cloud infrastructure, and DevOps, he has been instrumental in both startup launches and enterprise-level tech transformations. His approach intertwines deep technical knowledge with strategic business insights, aiding in everything from vision setting and market research to contract negotiations and investor relations. As a member of the Forbes Business Council, he consistently delivers valuable insights in the areas of technology and people management.\",\"url\":\"https:\/\/hypersense-software.com\/blog\/author\/andrei-neacsu\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"AWS Athena - Parsing apache, nginx and AWS ELB access logs","description":"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/","og_locale":"en_US","og_type":"article","og_title":"AWS Athena - Parsing apache, nginx and AWS ELB access logs","og_description":"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries","og_url":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/","og_site_name":"HyperSense Blog","article_publisher":"https:\/\/www.facebook.com\/hypersense.software","article_published_time":"2019-02-15T14:19:00+00:00","article_modified_time":"2024-09-12T11:29:50+00:00","og_image":[{"width":1360,"height":766,"url":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","type":"image\/jpeg"}],"author":"Andrei Neacsu","twitter_card":"summary_large_image","twitter_creator":"@HyperSenseSoft","twitter_site":"@HyperSenseSoft","twitter_misc":{"Written by":"Andrei Neacsu","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#article","isPartOf":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/"},"author":{"name":"Andrei Neacsu","@id":"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/ab8c2a667674a1b3926d6b1f0685ab3c"},"headline":"AWS Athena &#8211; Parsing apache, nginx and AWS ELB access logs","datePublished":"2019-02-15T14:19:00+00:00","dateModified":"2024-09-12T11:29:50+00:00","mainEntityOfPage":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/"},"wordCount":1258,"publisher":{"@id":"https:\/\/hypersense-software.com\/blog\/#organization"},"image":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage"},"thumbnailUrl":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","articleSection":["Cloud &amp; Serverless Computing","Web Development"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/","url":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/","name":"AWS Athena - Parsing apache, nginx and AWS ELB access logs","isPartOf":{"@id":"https:\/\/hypersense-software.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage"},"image":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage"},"thumbnailUrl":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","datePublished":"2019-02-15T14:19:00+00:00","dateModified":"2024-09-12T11:29:50+00:00","description":"A quick read on how to parse apache and nginx logs using AWS Athena. Offering a solution for parsing apache and nginx output, logs are processed and exported into a structured format that supports advanced SQL queries","breadcrumb":{"@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#primaryimage","url":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","contentUrl":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/nNpbvBxo_2x.jpg","width":1360,"height":766,"caption":"hypersense aws athena"},{"@type":"BreadcrumbList","@id":"https:\/\/hypersense-software.com\/blog\/2019\/02\/15\/aws-athena-parsing-apache-nginx-logs\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/hypersense-software.com\/blog\/"},{"@type":"ListItem","position":2,"name":"AWS Athena &#8211; Parsing apache, nginx and AWS ELB access logs"}]},{"@type":"WebSite","@id":"https:\/\/hypersense-software.com\/blog\/#website","url":"https:\/\/hypersense-software.com\/blog\/","name":"HyperSense Blog","description":"Latest software development trends and insights","publisher":{"@id":"https:\/\/hypersense-software.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/hypersense-software.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/hypersense-software.com\/blog\/#organization","name":"HyperSense Software","url":"https:\/\/hypersense-software.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/hypersense-software.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/logo-hypersense-512.svg","contentUrl":"https:\/\/hypersense-software.com\/blog\/wp-content\/uploads\/2023\/04\/logo-hypersense-512.svg","width":64,"height":64,"caption":"HyperSense Software"},"image":{"@id":"https:\/\/hypersense-software.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/hypersense.software","https:\/\/x.com\/HyperSenseSoft","https:\/\/www.instagram.com\/hypersensesoftware\/","https:\/\/ro.pinterest.com\/HyperSenseSoft\/","https:\/\/www.linkedin.com\/company\/hypersense-software\/","https:\/\/www.behance.net\/hypersense","https:\/\/www.youtube.com\/@hypersensesoftware","https:\/\/github.com\/HyperSense-Software"]},{"@type":"Person","@id":"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/ab8c2a667674a1b3926d6b1f0685ab3c","name":"Andrei Neacsu","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/hypersense-software.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/3dedf5440207d67bade8089703be1d2424d9d03a74e060a0cac6c7e1d24b5009?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3dedf5440207d67bade8089703be1d2424d9d03a74e060a0cac6c7e1d24b5009?s=96&d=mm&r=g","caption":"Andrei Neacsu"},"description":"Andrei, CTO and co-founder of HyperSense Software Inc., has an extensive career spanning over 15 years in the tech industry. With hands-on experience in mobile and web development, cloud infrastructure, and DevOps, he has been instrumental in both startup launches and enterprise-level tech transformations. His approach intertwines deep technical knowledge with strategic business insights, aiding in everything from vision setting and market research to contract negotiations and investor relations. As a member of the Forbes Business Council, he consistently delivers valuable insights in the areas of technology and people management.","url":"https:\/\/hypersense-software.com\/blog\/author\/andrei-neacsu\/"}]}},"_links":{"self":[{"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/posts\/2043","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/comments?post=2043"}],"version-history":[{"count":2,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/posts\/2043\/revisions"}],"predecessor-version":[{"id":4119,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/posts\/2043\/revisions\/4119"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/media\/2044"}],"wp:attachment":[{"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/media?parent=2043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/categories?post=2043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hypersense-software.com\/blog\/wp-json\/wp\/v2\/tags?post=2043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}