Hi All
An interesting insight:
We all know LLMs are great, but they often struggle with complex, domain-specific tasks such as cybersecurity-related tasks (i.e., threat intelligence). In this study, we introduce SecKnowledge and CyberPal. AI to address these challenges and train security-expert LLMs.
We construct SecKnowledge, an instruction-tuning dataset generated using an expert-driven process on a wide range of security-related datasets.
The dataset construction involves two main steps:
In the first step, we create instructions based on predefined schemas established through domain expertise. These schemas define templates that are filled with domain expert knowledge and supplemented with LLM-generated content when necessary.
In the second step, we expand the initial dataset through a hybrid synthetic content-based data generation process.
Then, we train CyberPal. AI, a family of cyber-security expert LLMs, capable of understanding complex security concepts. CyberPal. AI demonstrates the advantages of enhancing LLMs with our domain-knowledge instruction dataset, SecKnowledge.
Lastly, we developed SecKnowledge-Eval, a suite of evaluation datasets specifically designed to assess LLMs in the cyber-security domain. SecKnowledge-Eval consists of evaluation datasets we constructed to assess LLMs’ capabilities on complex cyber-security tasks, alongside public benchmarks, intending to generate a comprehensive and diverse evaluation dataset.
CyberPal. AI demonstrated superior performance over its baseline models, showing a substantial average improvement of up to 24% in training-aligned tasks and up to 10% in public cyber-security benchmarks.
Check out our approach here: https://lnkd.in/dn-QfSFB
See attached:
Regards
Caute_Cautim