-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All chapters: Black Box Model comments #261
Comments
Comments from Andrew Duncan from the Turing Centre General comments:
Specific Comments: • Section 2: “Artificial Intelligence models (including Machine Learning) are the most common type of black box models used today. Other forms of black box models may arise in future." It's not just about the nature of the model, it’s also about the provenance. For example, it’s quite common for an organisation to purchase a piece of commercial modelling software where the inner workings are proprietary and protected IP, in which case it is black box. I would argue this is far more a common existing setting for bb models (unless you’re developing all your models in-house). Additionally, as models grow in complexity, there will come a point where they just have to be considered black-box, simply because the inner workings are too complex and multi-faceted. • Figure 3-1: Version control is certainly important for maintaining longevity of a software base, but is it really relevant to assurance? I would assume this would be mandated for all software engineering activities. • Section 6.6: Concerns about ethics, etc would pertain to any data-driven model (not just AI) -- or are we using AI as a catch-all for all data-driven models (including white-box/ grey-box models) – see previous comment. • Section 7.6: For data-driven black-box models, it’s also important that the Analyst produces an estimate of how long they expect the trained model to remain within validity thresholds (e.g. due to concept drift), and thus come up with a plan for retraining / updating etc. This needs to be accompanied by a plan of how additional / refreshed data will be brought in to refresh the model and what resource (compute) is required. • Section 8.3.1: For data-driven models, especially black-box, most V&V methods are statistical: i.e. evaluating validity over a sample of the dataset. |
@lmadavies @Hurstharrier . The branch that I created for these changes is now a fair bit behind main. So it would be good to create a new branch from main for these edits, to minimise conflicts. Have either of you started to work on these edits on a local branch, or can you wait until we have committed the outstanding PRs (which should hopefully be done tomorrow). It would be best to create the new branch only when you are ready to start work on the edits so that we try keep the new branch as synced up with main as possible |
@irisoren-sg I haven't started adding in the comments as I didn't think I had the permissions set up but may do now. Happy to wait until you have made the outstanding PRs before starting on this. You can either start a new branch or rebase the current one, shouldn't make too much difference. |
@lmadavies I have deleted the branch and will start a new one once we have merged the big outstanding commits. I would prefer to minimise the chance of getting rebase conflict resolution problems. I'll let you know when the branch is ready |
Initial updates to the pages have been made - I would like to do a final review before merging. I have also tried to address the comments from Turing. The ones I have not included currently are: • Figure 3-1: Version control is certainly important for maintaining longevity of a software base, but is it really relevant to assurance? I would assume this would be mandated for all software engineering activities. • Section 6.6: Concerns about ethics, etc would pertain to any data-driven model (not just AI) -- or are we using AI as a catch-all for all data-driven models (including white-box/ grey-box models) – see previous comment. • Section 8.3.1: For data-driven models, especially black-box, most V&V methods are statistical: i.e. evaluating validity over a sample of the dataset. |
@lmadavies I have reviewed the additions. I have made one minor change. On the other points @valentine-scroll our editor will take a look too as she is reviewing the whole document and it makes sense to include this branch On the bits you haven't changed. Version control - strictly speaking he is right but in my experience many of us don't treat models as pieces of software that require maintenance etc. Therefore, I think this is a useful reminder. ON ethics, etc. It seems to me that ethics concerns are more of an issue with AI - just read the papers! Happy to include a reference to ethics for all analysis but I think we need to also make the point for AI. On data driven models - I agree but am not sure what to add or amend. Am open to suggestions. |
Version control - I think we do want to include version control in the "types of assurance" figure. Andrew may be specifically thinking about git version control which is a core software engineering activity but the book is meant to cover all types of model (including non-coding models) so therefore I think the meaning is broader here and relates to assurance. Personally, I would argue that every piece of analysis has a level of version control (minimum: when it was produced and whether this is a dev, pre-qa or live version) but won't add anything else at this stage as this goes slightly beyond AI/black box. For the maintenance point, I have added a few points about this already. Agree that it is an important point to remember specifically for AI models. On ethics - You have a reference to ethics in the Analytical Plan section on the design page. I will add something into section 6.3 Assurance activities in the engagement and scoping stage to mention ethics again. On data driven models - Agree not sure what to amend or add here. |
@lmadavies |
These comments on the sections related to black box models made on the version which was live on Friday 8 November 2024. The sections outlined below are what was covered.
Definitions
E.g. Machine Learning uses algorithms to learn from patterns in data without needing to programme explicit business rules. Some models are white box models and some are black box models. It is a subset of artificial intelligence.
Proportionality: 3.4 Artificial intelligence and business risk
No comments. Proportionality remains the same consideration. AI models are naturally a more complex analytical process/technique so will require more structure around the QA process, but this is in line with other complex models.
Analytical Life Cycle: 5.2.5 Maintenance and continuous review
Engagement and Scoping: 6.6 Black box models and the engagement and scoping stage
No comments. It highlights the areas where ML/AI models require additional scrutiny and offers a link to a document to assist with this.
Design Decisions
Analysis
Delivery and Communication: Black box models and the delivery, communication and sign-off stage
The text was updated successfully, but these errors were encountered: