Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WEBSITE] Add scarf to readme for website analytics #219

Merged
merged 2 commits into from
Sep 18, 2024

Conversation

cmarteepants
Copy link
Collaborator

This adds website analytics to the Dag-Factory readme. Scarf privacy policy: https://about.scarf.sh/privacy-policy

Note that while you cannot explicitly opt-out of website analytics for the publicly hosted readme (and docs), Scarf respects browser DND. If that is set via the browser, telemetry for that user will not be sent to Scarf.

This adds website analytics to the Dag-Factory readme. Scarf privacy policy: https://about.scarf.sh/privacy-policy

Note that while you cannot explicitly opt-out of website analytics for the publicly hosted readme (and docs), Scarf respects browser DND. If that is set via the browser, telemetry for that user will not be sent to Scarf.
README.md Show resolved Hide resolved
@@ -5,6 +5,7 @@
[![PyPi](https://img.shields.io/pypi/v/dag-factory.svg)](https://pypi.org/project/dag-factory/)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![Downloads](https://pepy.tech/badge/dag-factory)](https://pepy.tech/project/dag-factory)
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=2bb92a5b-beb3-48cc-a722-79dda1089eda" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be rendered as

<img src="https://camo.githubusercontent.com/4b5c60ccaf0796d9330635a0927b42159b34946c2bf4d7aa4e412c8b1f606f59/68747470733a2f2f706570792e746563682f62616467652f6461672d666163746f7279" alt="Downloads" data-canonical-src="https://pepy.tech/badge/dag-factory" style="max-width: 100%;">

Probably due to:

Pixel-based telemetry will work on standard webpages, rendered markdown documentation on package registry sites like Docker Hub, npm, and PyPi, and anywhere an image can be embedded, with a notable exception being GitHub. When GitHub renders markdown, it rewrites URLs from their original web address to https://camo.githubusercontent.com/, where GitHub hosts any linked images themselves. This prevents Scarf from providing insights to maintainers, since all that can now be detected at the original web address via the tracking pixel is undifferentiated traffic from GitHub.

Paragraph from: https://docs.scarf.sh/web-traffic/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we should be able to see this change in https://pypi.org/project/dag-factory/ once we publish this change to PyPI.

Copy link
Collaborator Author

@cmarteepants cmarteepants Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Arjun that it still works, but that it won't be as fine-grained. We can do a separate one for the pypi site.

Copy link
Collaborator

@tatiana tatiana Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When users access the README.md hosted on Github, Scarf will believe that the traffic is coming from Github IPs, as if they were performing the web traffic.

This means that we won't have conversion rates from viewing docs to downloading DAG Factory artifacts and possibly not which parts of DAG Factory documentation are looked at most when the access comes from Github, which seem to be the main features of this:
https://docs.scarf.sh/web-traffic/

That said, once the package is published, the data that comes from PyPI should be accurate. Would it be possible for us to filter out the misleading/incomplete information added by Github in the Scarf UI? Or would it make sense to have a PyPI README that is not the same as the Github's one and only add the pixel to the PyPI README?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the pixel that was specific for the readme. Yes, we should use a separate one for pypi. It would interesting to compare, and if we find the one directly embedded in the readme isn't useful, we can remove it at that time.

Copy link
Collaborator

@tatiana tatiana Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the PyPI change in a follow-up PR? ATM, both Github and PyPI use the same README.md:

readme = "README.md"

If we decide to split them, we can rename the tracking pixel from dag-factory-readme to dag-factory-github-readme and create a new dag-factory-pypi-readme. The only downside is that we'll need to maintain two and not only one README up-to-date.

I noticed the project doesn't currently have automated release pipeline. An alternative, if we decide to not have the tracking pixel in the Github README, could be to add the tracking pixel in a separate markdown and make it part of the PyPI one as part of the deployment pipeline, using something like https://pandoc.org/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, it seems these docs in Scarf are outdated:

When GitHub renders markdown, it rewrites URLs from their original web address to https://camo.githubusercontent.com/, where GitHub hosts any linked images themselves. This prevents Scarf from providing insights to maintainers, since all that can now be detected at the original web address via the tracking pixel is undifferentiated traffic from GitHub.

In practice, we currently (20 September) can track pixel-based events in Github markdown pages with the current changes. We were able to see locations and companies in the Scarf UI.

@tatiana tatiana merged commit b7f07a9 into main Sep 18, 2024
6 checks passed
@tatiana tatiana added this to the DAG Factory 0.20.0 milestone Oct 10, 2024
@tatiana tatiana mentioned this pull request Oct 17, 2024
tatiana added a commit that referenced this pull request Oct 22, 2024
### Added
- Support using envvar in config YAML by @tatiana in #236
- **Callback improvements**
- Support installed code via python callable string by @john-drews in
#221
- Add `callback_file` & `callback_name` to `default_args` DAG level by
@subbota19 in #218
- Cast callbacks to functions when set with `default_args` on TaskGroups
by @Baraldo and @pankajastro in #235

- **Telemetry**
- For more information, please, read the [Privacy
Notice](https://github.com/astronomer/dag-factory/blob/main/PRIVACY_NOTICE.md#collection-of-data).
  - Add scarf to readme for website analytics by @cmarteepants in #219
- Support telemetry during DAG parsing emitting data to Scarf by
@tatiana in #250.

### Fixed
- Build DAGs when tehre is an invalid YAML in the DAGs folder by @quydx
and @tatiana in #184

### Others
- Development tools
  - Fix make docker-run by @pankajkoti in #249
  - Add vim dot files to .gitignore by @tatiana in #228
  - Use Hatchling to modern package building by @kaxil in #208
- CI
  - Fix static check failures in PR #218 by @pankajkoti in #251
  - Fix pre-commit checks by @tatiana in #247
  - Remove tox and corresponding build jobs in CI by @pankajkoti in #248
- Install Airflow with different versions in the CI by @pankajkoti in
#237
  - Run pre-commit hooks on all existing files by @pankajkoti in #245
  - Add Python 3.11 and 3.12 to CI test pipeline by @pankajkoti in #229
- Tests
  - Fix duplicate test name by @pankajastro in #234
  - Add static check by @pankajastro in #231
  - Fix running tests locally (outside the CI) by @tatiana in #227
  - Add the task_2 back to dataset example by @cmarteepants in #204
  - Remove unnecessary config line by @jlaneve in #202
- Documentation
  - Update the license from MIT to Apache 2.0 by @pankajastro in #191
- Add registration icon and links to Airflow references by @cmarteepants
in #190
  - Update quickstart and add feature examples by @cmarteepants #189

### Breaking changes
- Removed support for Python 3.7
- The license was changed from MIT to Apache 2.0

Closes: #217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants