Hi Jeff!
I'm getting this error message:
curl: (35) error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small
Date: Tue Jun 23 10:00:13 -03 2020 Error retrieving Scheduled log from WGCS. Curl return code:35 Log file not created
Could you help me please?
Pierre
Version 5.3 Attached.
Many changes and improvements. Most importantly if a partial file is returned it will be correctly processed and the data will not be pulled a second time. Also improved error handling. From the notes in the script includes all improvements from 5.2:
#version 5.2.1 Tries smaller queries if time taken exceeds query timeout and tries reducing time before reducing record count limit
# Also identifies error based on query timeout (version not released)
#version 5.2.2 Adds x-mfe-timeout header to curl and uses non blank last line to detect error
# rather than query timeout (version not released)
#version 5.2.3 Tries to recover data if partial data returned(version not released)
#version 5.2.4 Reorder code file returned check at beginning of file processing
# Consolidate code for processing returned file (version not released)
#version 5.2.5 Fix bugs and adds 10 sec to request vs created to to account for clock skew. (version not released)
#version 5.3 Rewrote merging to eliminate redundancies and make more logical
#version 5.3.1 Changed update of last success epoch to be based on record count rather than log file exists
# Removed header x-mfe-timeout added in 5.2.2
# Improved efficiency by ensuring for each query request created from not more than a day prior to request start and
# Coded so request to never gt created to
# Changed logging / reporting so that if request to is not specified current epoch - dbBackset (default 2 mins) is used
Limited testing has been performed. However it is recommended that you verify changes in your environment before putting into use.
We just recently discovered that pulling logs from msg.mcafeesaas.com or msg.mcafee-cloud.com does NOT download logs from the EU and CSR has the same issue.
https://kc.mcafee.com/corporate/index?page=content&id=KB91669&locale=en_US
It turns out that the EU has its own endpoint for pulling logs. eu.msg.mcafeesaas.com. However, when we point this script at this endpoint, we were getting 500 errors. The culprit was this:
--header "x-mfe-timeout: $queryTimeout"
Once we removed that it worked.
So now, we have 2 copies of this script running on the same server. And the one for the EU has the problematic header removed.
Just an FYI if you have users in the EU and elsewhere. And if someone feels like updating the script to include multiple locations, that would be helpful.
Thanks for the post and heads up. That header is optional, and actually as it turns out, not really effective in addressing what it was intended to address so it can be stripped regardless of whether you are pulling from EU or NA log repositories. There are other repositories currently available as well, log location and server to pull from depends on where you choose to log in your UCE or Cloud ePO configuration.
CSR 2.8 you would use multiple log sources to pull from EU and NA
As for modifying the script to pull from multiple locations, yes it could be done but I don't think I can get to it any time soon, because it would be a non-trivial effort. Each source needs separate tracking which is done currently in the single conf file. Probably better to just run two instances that run independently in separate folders with separate conf files as it sounds like you've already done. Good work.
Hi @jebeling
first, many thanks for your script Jeff!!
Can you please explain the logic behind these lines in the script?
# calculate number of records returned
# if last line blank pulled file is complete
if [[ -z $lastLine ]]; then
# echo "Complete file returned (last line blank)"
completeFile=true
else
# if last line is not blank, delete useless records and add two blank lines
if (( $logversion > 5 )) ; then
# echo "Log version > 5 and last line not blank deleting error message"
sed -i '$d' $qTmpFileName
fi
sed -i '$d' $qTmpFileName
# add two blank lines
echo Query timeout partial file returned, script will initiate additional queries
echo "" >> $qTmpFileName
echo "" >> $qTmpFileName
completeFile=false
cp $qTmpFileName $qOrigEndCTime.part
fi
if [[ $headerBlank == "true" ]] && [[ -z $lastLine ]] ; then
recordCount=0
lastEpoch=$qEndCTime
else
recordCount=`wc -l < $qTmpFileName`
recordCount=$(($recordCount - $extraLines))
# echo Original record count: $recordCount
fi
if [ $recordCount -gt 0 ]; then
Wow, you are really diving into it. Backend code has changed quite a bit since the original release and also intermediary releases, and its been quite a while since I wrote the last release, but I will attempt to answer your questions to the best of my knowledge. If the last line is not blank then all the data for a given api call was not retrieved. If the last line is not blank then we may not have gotten all the log lines for the last timestamp. Therefore every record from the last timestamp must be removed (useless because you cant just get the remaining records for that timestamp with the next query) and the next api call should include records for the timestamp whose records were incomplete and were deleted.
Same goes for record limit. If you hit the record limit, then you aren't assured of getting all the records for the last timestamp, so you delete them and get them with the next pull.
Max time only matters if you also hit record limit. If you don't hit record limit max time will bound the query and you will get all the records for the last timestamp.
A successful completed return actually ends in two blank lines. if another query is needed to complete the original timerange, then you want to delete the trailing blank lines so when you merge you don't end up with two blank lines in the middle of the merged log.
If the last line is not blank something broke but we don't want to throw away all the data, just data that would be duplicated in a subsequent query. Adding the blank lines back in, normalizes the file for subsequent processing. A bit of a kludge but easiest way to handle after I discovered this anomaly.
Lines 435-437 you are correct.
Lines 440-446 correct, see above
Line 380 I think I took care of that with a rename at some point but I would have to look at the code. Shouldn't hurt to remove them but it would be nice to have them if they are remnant for debugging, could happen if someone aborts the script midstream, but then again you wouldn't reach the deletion at the end of the script if that happened. 😉
many thanks for quick replay Jeff.
I believe many of quirks should be fixed on the server side to simplify the API and the handling as much as possible. For example a better server logic has to be applied to avoid discarding events with the lastTimestamp to make transfer more efficient.
I have a large WGCS customer that creates logs at hundreds of MB/min rate, the processing of logs takes a lot of CPU power and produces a high I/O. Sometimes I have an impression that the script cannot keep up with the incoming log rate. The culprit is the multiple usage of "sed" that creates a two fold I/O comparing to the size of the file, because it creates a temp copy.
sed can be replaced with truncate if all is needed is to delete a last character. Truncate works with a metadata only not touching the data itself.
[user@server]$ ls -lh largefile
-rw-r--r-- 1 splunk splunk 3.1G Mar 21 17:03 largefile
[user@server]$ wc -l largefile
7987298 largefile
[user@server]$ time sed -i '$d' largefile
real 0m24.946s
user 0m3.945s
sys 0m21.000s
[user@server]$ time stat --format="%s" largefile
3265044288
real 0m0.001s
user 0m0.000s
sys 0m0.001s
[user@server]$ echo $(( $( stat --format="%s" largefile ) -2 ))
3265044286
[user@server]$ time truncate -s $(( $( stat --format="%s" largefile ) -2 )) largefile
real 0m0.004s
user 0m0.002s
sys 0m0.001s
[user@server]$ ls -l largefile
-rw-r--r-- 1 splunk splunk 3265044286 Mar 21 19:52 largefile
[user@server]$
Additionally I found some SIEMs, particularly Splunk, can be configured to ignore blanks, so all the handling of blanks can be skipped.
My customer will generate much more log in the future, so I'll be very happy to work with you Jeff to optimize the script. Other methods can be splitting the work between several worker scripts by region or time.
We also generate large log files (about 1 GB every 5 minutes). We use CSR in a high-performance server and database environment to get the logs on premise and process them. We also switched over from using a UNIX-based log collection for SIEM to the McAfee Logger tool. There was a fix we needed, but it appears to be running great now.
With the McAfee Logger tool (Windows only), you can send the data directly to a server over syslog port, or you can save the logs and have your SIEM server get the files directly.
New to the forums or need help finding your way around the forums? There's a whole hub of community resources to help you.
Thousands of customers use our Community for peer-to-peer and expert product support. Enjoy these benefits with a free membership: