Baseline Model Cont. / Commuting Matrices

Topics

This week’s assignments will guide you through the following topics:

  • Finishing a baseline model
  • Creating the HIN in the Hindroid paper

Reading

Please read the following:

  • (Re)Read HinDroid section 3.2, along with section 4-4.2. You will have to read section 3.2 many times!
  • Review SVMs and the “Kernel Trick” (The communiting matrices of section 3.2 will be the kernel in our SVM classifier).

Tasks

Complete the following tasks:

  • Finish you work from last week: create a baseline model using hand-coded features developed in the previous two weeks. Create a few new features this week (e.g. derived from the Package or method name for example, informed by your EDA). The choice of classifier is up to you. (Note: you should not be implementing Hindroid here). What is the most appropriate evaluation metric here? (Hint: it’s not accuracy).
  • Implement the commuting matrix AA^T (keep in mind you will have to do this commuting matrices involving B, P, and I next week as well!). That is, write code that takes your intermediate csv data and creates the commuting matrix AA^T in Hindroid.
  • Take one of the Malware types you investigated this week and find the code in smali where the ‘bad’ behavior occurs (if you cannot solve this, it’s ok).

Note 1: Not all of the APIs in the dataset may be needed to compute commuting matrices. Which apis might you be able to ignore (think about what AA^T is – and how it measures similarity between two apps).

Note 2: You may find sparse matrices valuable. Explore the documentation and share useful tutorials on slack (e.g. this one?).

Weekly Questions

Answer the following questions on Canvas:

  • Consider the commuting matrix AA^T in the Hindroid paper. What do the elements represent?
    • Give an answer in terms of paths on hindroid’s HIN (graph).
    • Give an answer that is generally understandable, in iterms of apps and apis.
  • What evaluation metric(s) for your baseline did you use and why?
  • What was the most difficult thing you experienced this week, to discuss in section.