CompSci boffins think they've come up with a novel way to recreate missing entries in log files.
In a paper titled Bagging Recurrent Event Imputation for Repair of Imperfect Event Log with Missing Categorical Events, Dr Sunghyun Sim and Professor Hyerim Bae (both from Pusan National University in South Korea), and Professor Ling Liu of the USA's Georgia Institute of Technology, point out that log files should faithfully record timestamps, event names, and other data.
But for whatever reason, logs are sometimes imperfect or omit some records, which makes it hard to reconstruct events. Logs with missing lines can also mess up AI training models.
The three authors couldn't find a tool to recreate missing events. So they built an algorithm that correlates data from other relevant sources to generate the missing log entries. Essentially, it works by figuring out which bits of information from multiple sources are needed to form a log entry, and then automating the process of generating missing entries from the available info.
"Since data is collected from multiple perspectives in numerous information systems, there is a relationship between the collected data," said Dr Sim. "Starting with this point, our study suggested a method of restoring missing event values by utilizing the relationship among entities in the event log, which can overcome human or system error."
- Log4j RCE latest: In case you hadn't noticed, this is Really Very Bad, exploited in the wild, needs urgent patching
- AWS postmortem: Internal ops teams' own monitoring tools went down, had to comb through logs
- ProtonMail deletes 'we don't log your IP' boast from website after French climate activist reportedly arrested
The authors applied Systematic Event Imputation (SEI) and Multiple Event Imputation (MEI) simultaneously alongside a bagging recurrent event imputation (BREI) algorithm, using bootstrap sampling and recurrent event imputation (REI) to repair damaged event logs. The results were very promising, we're told: tests with actual event logs "improved restoration accuracy by 10–30 per cent compared to existing restoration algorithms.
"Moreover, it could restore almost 90 per cent of the data accurately even when more than half of it was missing."
The boffins' work has been published in IEEE Transactions on Services Computing. A summary with a graphic attempting to explain it has been issued here. The authors express their belief the algorithms they developed will soon be pressed into service by actual users in industry.
The computer scientists' slide illustrating how their log-repair algorithm works ... Click to enlarge. Source: Pusan National University
Hopefully that will only happen with full disclosure of which lines of a log have been reconstructed and which are originals – imputed logs clearly have potential to make life interesting for digital forensics practitioners. ®