Summary

  • This post takes a look at one of Microsoft’s interesting datasets for applying machine learning to cyber security analytics use-cases
  • We’ll look at some of the issues that have been found with it, and use a corrected version of this data
  • Specifically, we’ll mainly use PySpark, which is an API for using Apache Spark on Databricks, allowing processing of large-scale, distributed data, which we’ll apply to building a machine-learning model that can distinguish between benign traffic and attack traffic in the dataset.
  • We then see how to deploy this model as an API using the FastAPI library, and run it within our Databricks environment.

By Rob Harrand

Original Article