Skip to content

Conversation

@ailrst
Copy link
Contributor

@ailrst ailrst commented Jan 31, 2023

Preparing Macaron for policy engine which uses the Souffle datalog interpreter.

Architecture

The goal is to use souffle to evaluate the policy, while loading the facts directly from the sqlite database.

For this to work there is the following requirements, from the souffle docs:

The data is expected to be stored in a table matching the relation name prefixed by an underscore and the sqlite3 database is expected to contain a view matching the relation name. For example, for the relation edge, the sqlite3 database should have a table named _edge.

To input it there needs to be a relation declared that matches the view, and a corresponding input statement.

So broadly the way this pr works is:

  1. Tables are declaratively defined in macaron and checks using SQLAlchemy
  2. From the analyzer class, DatabaseManager.create_tables() creates the database, tables, and views if they don’t exist.
  3. Checks store populated orm-mapped tables to CheckResult["result-tables"]
  4. Analyzer stores these to the database after analysis is completed, along with the information macaron stores such as the analyzed repositories and dependency tree

The policy engine is invoked from a separate script which is passed the database file and a policy file

python -m macaron.policy_engine -h usage: policy_engine [-h] -d DATABASE [-f FILE] [-s] 

options:
-h, --help show this help message and exit
-d DATABASE, --database DATABASE
Database path
-f FILE, --file FILE Replace policy file
-s, --show-preamble Show preamble
$ python -m macaron.policy_engine -d output/macaron.db -f tests/policy_engine/resources/policies/testpolicy.dl -h

At this stage what this does is

  1. The database is opened and all the schema is reflected into the SQLAlchemy metadata
  2. For each table beginning with an _ a corresponding souffle declaration and import is generated
  3. Some helper relations and rules are generated
  4. The prelude is constructed by combining the import statements, helper relations, and some additional non-generated rules
  5. A file is created with the generated prelude prepended to the actual policy file
  6. Souffle is invoked on this file, and the results are printed

Changes Summary

  • Import SQLAlchemy to manage database connection

  • Refactor DatabaseManager to use SQLAlchemy (api change)

    • and add corresponding unit tests
  • AnalyzeContext now returns orm-mapped tables to be inserted into the database, rather than constructing SQL queries

  • CheckResult has a new field "result_tables: list[Table]"

  • Analyzer now populates tables to store the analysis, dependency, and slsa-level results, and check_results

  • AnalyzeContext now stores a orm-mapped table to represent the repository being analyzed, which is stored to the database by the Analyzer object before analysis starts

  • Analyzer stores all tables which checks insert into CheckResult["result_tables"] to the database after analysis

    • Nearly all checks are modified to define and store result tables
  • base_check.py defines a table to store check results

  • base_check.py defines an SQLAlchemy declarative mixin CheckFactsTable which defines check_result id, and repository id, foreign key fields which when result tables inherit from it, the analyzer will automatically populate.

  • provenance_l3_check is stricter as per pull/29.

  • add: policy_engine/__main__.py is the entry point for the policy engine

  • add: policy_engine/souffle_code_generator.py contains the logic for generating the souffle datalog for data import

  • add: policy_engine/souffle.py contains the wrapper for invoking souffle in a temporary directory

    • and corresponding unit tests
  • policy_engine/policy.py has some changes due to a manually reverted refactor; it will likely have to be refactored again to integrate the policy engine

To do

  • Policy engine: validate database version before proceeding
  • Integrate provenance policies using CUE or proof of concept policy engine
  • Have macaron run policy and include result in reports
  • Update build; add souffle to docker
  • Filter duplicate analyze_ctx
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jan 31, 2023
@ailrst ailrst force-pushed the database-policy branch 7 times, most recently from e1274f1 to 99f4e58 Compare January 31, 2023 06:52
@ailrst ailrst changed the title feat: Database update and policy engine feat: add check output to database and implement souffle policy engine Feb 1, 2023
@ailrst ailrst force-pushed the database-policy branch 2 times, most recently from 0a6c6d8 to 09a2f2d Compare February 1, 2023 07:13
"""Verify a provenance against a user defined policy."""
prov_file = verify_args.provenance
policy_file = verify_args.policy
policy_files = list(filter(lambda path: ".yaml" == os.path.splitext(path)[1], global_config.policy_paths))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension can be .yml too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I also added a todo under the policy class; I think eventually it would be better to have the Policy class define what filetypes it can be constructed from (eg. with a method that is passed the filename) so the configuration can just ask each policy and pick the first one that says yes. Currently though cue and YAML policies both use the same class, and souffle policies are another unrelated class since they need different interfaces, but the design could still be rationalized better.

@@ -1,4 +1,4 @@
# Copyright (c) 2022 - 2022, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2022 - 2023, Oracle and/or its affiliates. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these files seem to have copyright header updates only. Can you please unstage?

Copy link
Contributor Author

@ailrst ailrst Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should have been removed in recent rebase

Comment on lines 22 to 24
/**
* The build is verifiably automated and .
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* The build is verifiably automated and .
*/
/**
* The build is verifiably automated and deployable.
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7584d19

@ailrst ailrst marked this pull request as ready for review February 6, 2023 05:54
@ailrst ailrst force-pushed the database-policy branch 2 times, most recently from 69bb689 to 7584d19 Compare February 6, 2023 07:15

.decl json_path(j: JsonType, a: JsonType, key:symbol)

json_path(a, b, key) :- a = $Object(k, b), json(name,_,a), key=cat(name, cat(".", k)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small note on the inconsistent spacing of the attributes in the relations. Perhaps we could create a ticket for resolving it later. Btw, have you encountered any linter for souffle 🤔 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small note on the inconsistent spacing of the attributes in the relations. Perhaps we could create a ticket for resolving it later. Btw, have you encountered any linter for souffle thinking ?

I have come across this repo to lint Souffle Datalog: https://github.com/langston-barrett/souffle-lint
It could be something to add as a pre-commit hook locally, but haven't tried it out yet.

self.db_man = DatabaseManager(db_path)
"""Set up the database and ensure it is empty."""
self.db_path = str(Path(__file__).parent.joinpath("macaron.db"))
print(self.db_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more print left here 😅

@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Feb 24, 2023
@behnazh-w behnazh-w merged commit 9e89477 into staging Feb 26, 2023
@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Feb 27, 2023
@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Feb 27, 2023
@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Feb 27, 2023
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
#46) Signed-off-by: Alistair Michael <alistair.michael@oracle.com> Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

3 participants