-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix image path in efficient_ai.qmd and update data_engineering.qmd #301
Conversation
…gineering-Section Update data_engineering.qmd
Fixed the image path in efficient_ai.qmd
Hi Professor Vijay, I hope all is well! I've made significant updates and fixes that are crucial for our project's progress: I fixed the image path in efficient_ai: I corrected the reference to ensure it is rendered correctly in the document. Additionally, I am working on telecom outage prediction and am fully committed to applying TinyML in this domain to the best of my abilities. Working on this book has been a truly enjoyable experience, especially given my over 7 years of industry experience. I am deeply committed to this project and look forward to our continued collaboration. Please let me know your feedback, and I am ready to work on other chapters. Warm regards, |
@Sara-Khosravi thanks again for these edits. In the future, could you please make sure that you modify only one file at a time, as it is easier to do merges and rollbacks if needed? |
Hi Vijay,
Thank you for your positive feedback. I am glad that you are willing to
take a look at my project.
Regarding your question, I apologize for any oversight. I will double-check
to ensure I used the latest version of the repository. If it turns out I
didn't, I will make sure to synchronize with the latest version and update
my changes accordingly.
Thank you for bringing this to my attention. I appreciate your
understanding and patience.
Best regards,
Sara Khosravi
…On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***> wrote:
@Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for these
edits. In the future, could you please make sure that you modify only one
file at a time, as it is easier to do merges and rollbacks if needed?
—
Reply to this email directly, view it on GitHub
<#301 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please look over the comments and make the small tweaks please?
@@ -173,7 +173,8 @@ Another important consideration is the relationship between model complexity and | |||
|
|||
Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions. | |||
|
|||
![Different types of stoves. Credit: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves} | |||
![Different types of stoves. Credit: Dollar Street stove images.](images/jpg/DmUyPSSW0AAChGa.jpg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename the file as something like dollar_street.jpg
@@ -173,7 +173,8 @@ Another important consideration is the relationship between model complexity and | |||
|
|||
Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions. | |||
|
|||
![Different types of stoves. Credit: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves} | |||
![Different types of stoves. Credit: Dollar Street stove images.](images/jpg/DmUyPSSW0AAChGa.jpg)) | |||
{#fig-stoves} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency could we please put this next to the closing ]
@@ -8,7 +8,7 @@ bibliography: data_engineering.bib | |||
Resources: [Slides](#sec-data-engineering-resource), [Videos](#sec-data-engineering-resource), [Exercises](#sec-data-engineering-resource), [Labs](#sec-data-engineering-resource) | |||
::: | |||
|
|||
![_DALL·E 3 Prompt: Create a rectangular illustration visualizing the concept of data engineering. Include elements such as raw data sources, data processing pipelines, storage systems, and refined datasets. Show how raw data is transformed through cleaning, processing, and storage to become valuable information that can be analyzed and used for decision-making._](images/png/cover_data_engineering.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep this as is because this is verbatim what went into the DALLE model :)
Once data is collected, thoughtful labeling through manual or AI-assisted annotation enables the creation of high-quality training datasets. Proper storage in databases, warehouses, or lakes facilitates easy access and analysis. Metadata provides contextual details about the data. Data processing transforms raw data into a clean, consistent format for machine learning model development. | ||
Throughout this pipeline, transparency through documentation and provenance tracking is crucial for ethics, auditability, and reproducibility. Data licensing protocols also govern legal data access and use. Key challenges in data engineering include privacy risks, representation gaps, legal restrictions around proprietary data, and the need to balance competing constraints like speed versus quality. | ||
By thoughtfully engineering high-quality training data, machine learning practitioners can develop accurate, robust, and responsible AI systems. This includes applications in embedded systems and TinyML, where resource constraints demand particularly efficient and effective data-handling practices. In the context of TinyML, data engineering practices take on a unique character. Resource-constrained devices often necessitate smaller datasets with high signal-to-noise ratios. Data collection may be limited to on-device sensors or specific environmental conditions. Crowdsourcing and synthetic data generation have become precious tools for generating specialized datasets with limited memory and processing power. Careful optimization techniques for data cleansing, feature selection, and model compression are essential for TinyML applications. By understanding these nuances, data engineers can empower the development of efficient and effective AI solutions at the edge. | ||
## Resources {#sec-data-engineering-resource .unnumbered} | ||
|
||
Data is the fundamental building block of AI systems. Without quality data, even the most advanced machine learning algorithms will fail. Data engineering encompasses the end-to-end process of collecting, storing, processing, and managing data to fuel the development of machine learning models. It begins with clearly defining the core problem and objectives, which guides effective data collection. Data can be sourced from diverse means, including existing datasets, web scraping, crowdsourcing, and synthetic data generation. Each approach involves tradeoffs between cost, speed, privacy, and specificity. Once data is collected, thoughtful labeling through manual or AI-assisted annotation enables the creation of high-quality training datasets. Proper storage in databases, warehouses, or lakes facilitates easy access and analysis. Metadata provides contextual details about the data. Data processing transforms raw data into a clean, consistent format for machine learning model development. Throughout this pipeline, transparency through documentation and provenance tracking is crucial for ethics, auditability, and reproducibility. Data licensing protocols also govern legal data access and use. Key challenges in data engineering include privacy risks, representation gaps, legal restrictions around proprietary data, and the need to balance competing constraints like speed versus quality. By thoughtfully engineering high-quality training data, machine learning practitioners can develop accurate, robust, and responsible AI systems, including embedded and TinyML applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be deleted based on the above text?
There seems to be some repetition.
Cool. Thanks for taking a look. I left some comments.
Also, in the future, do you think we could make small changes because that
makes it easier to merge things in.
Vijay Janapa Reddi, Ph. D. |
John L. Loeb Associate Professor of Engineering and Applied Sciences |
John A. Paulson School of Engineering and Applied Sciences |
Science and Engineering Complex (SEC) | 150 Western Ave, Room #5.305 |
Boston, MA 02134 |
Harvard University | My Website
<http://scholar.harvard.edu/vijay-janapa-reddi> | Google Scholar
<https://scholar.google.com/citations?hl=en&user=gy4UVGcAAAAJ&view_op=list_works&sortby=pubdate>
| Edge Computing Lab <https://edge.seas.harvard.edu> | Book Meeting
<https://fantastical.app/vjreddi/> | Contact Admin
<https://scholar.harvard.edu/vijay-janapa-reddi/contact> |
On Thu, Jul 04, 2024 at 2:00 PM, Sara Khosravi ***@***.***>
wrote:
… Hi Vijay,
Thank you for your positive feedback. I am glad that you are willing to
take a look at my project.
Regarding your question, I apologize for any oversight. I will
double-check
to ensure I used the latest version of the repository. If it turns out I
didn't, I will make sure to synchronize with the latest version and update
my changes accordingly.
Thank you for bringing this to my attention. I appreciate your
understanding and patience.
Best regards,
Sara Khosravi
On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***>
wrote:
> @Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for
these
> edits. In the future, could you please make sure that you modify only
one
> file at a time, as it is easier to do merges and rollbacks if needed?
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/harvard-edge/cs249r_book/pull/
301#issuecomment-2209411682>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/
ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#301 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABT6DFB7SRZJBMDS5IBWQH3ZKWESTAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTGNBQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It is greatly appreciated, Vijay. I will look at it and fix it as soon as
possible. Also, I am working on the next section. I will update you after I
consider your comment.
Have a wonderful time.
On Thu, Jul 4, 2024 at 2:04 PM Vijay Janapa Reddi ***@***.***>
wrote:
… Cool. Thanks for taking a look. I left some comments.
Also, in the future, do you think we could make small changes because that
makes it easier to merge things in.
Vijay Janapa Reddi, Ph. D. |
John L. Loeb Associate Professor of Engineering and Applied Sciences |
John A. Paulson School of Engineering and Applied Sciences |
Science and Engineering Complex (SEC) | 150 Western Ave, Room #5.305 |
Boston, MA 02134 |
Harvard University | My Website
<http://scholar.harvard.edu/vijay-janapa-reddi> | Google Scholar
<
https://scholar.google.com/citations?hl=en&user=gy4UVGcAAAAJ&view_op=list_works&sortby=pubdate>
| Edge Computing Lab <https://edge.seas.harvard.edu> | Book Meeting
<https://fantastical.app/vjreddi/> | Contact Admin
<https://scholar.harvard.edu/vijay-janapa-reddi/contact> |
On Thu, Jul 04, 2024 at 2:00 PM, Sara Khosravi ***@***.***>
wrote:
> Hi Vijay,
>
> Thank you for your positive feedback. I am glad that you are willing to
> take a look at my project.
>
> Regarding your question, I apologize for any oversight. I will
> double-check
> to ensure I used the latest version of the repository. If it turns out I
> didn't, I will make sure to synchronize with the latest version and
update
> my changes accordingly.
>
> Thank you for bringing this to my attention. I appreciate your
> understanding and patience.
>
> Best regards,
> Sara Khosravi
>
> On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***>
> wrote:
>
> > @Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for
> these
> > edits. In the future, could you please make sure that you modify only
> one
> > file at a time, as it is easier to do merges and rollbacks if needed?
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <https://github.com/harvard-edge/cs249r_book/pull/
> 301#issuecomment-2209411682>,
> > or unsubscribe
> > <https://github.com/notifications/unsubscribe-auth/
>
ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI>
>
> > .
> > You are receiving this because you were mentioned.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <
#301 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ABT6DFB7SRZJBMDS5IBWQH3ZKWESTAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTGNBQHA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#301 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASHBIFBSYRLR3CJYJEFJV6DZKWFBNAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTMOJUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Looked over this and the changes are already merged in from other edits we did, so these updates are already in! |
Before submitting your Pull Request, please ensure that you have carefully reviewed and completed all items on this checklist.
Content
References & Citations
Quarto Website Rendering
Grammar & Style
Collaboration
Miscellaneous
Final Steps