-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a portal to see currently open windows #304
Comments
It sounds very niche. What other applications would use this portal? I know Discord is a common complaint that it can't see other apps, but since its proprietary somebody would have to convince them to use this anyway. |
Discord is another example, but another use-case would be for panels which can support multiple compositors (maybe not a good use-case for flatpak, but a use-case non the less). There are also competitors to ActivityWatch which are not open source (most notably RescueTime) which would have the same use-case as us. I guess it's kind of niche, but not extremely niche. Sooner or later some API like this would be needed for wayland and to me it seems like xdg-desktop-portal is the best fit. |
This is not the kind of information we would normally want to leak into sandboxes, I think. |
@matthiasclasen Neither should screenshots by default as that contains the same information and more, but you need to explicitly allow permissions for that right? |
A screenshot is an explicit operation. You take a screenshot of an individual window, present the result to the user, and ask him: "share this with "SpyApp deluxe" ?" What you are asking for is hard to present to the user in a meaningful form. |
Why wouldn't it be possible to do the same with this? Show a prompt containin "SpyApp is requesting access for information about all your running applications" and give the user the ability to allow or deny the request. By default this request should probably only be valid for the current session but it would also be nice with a checkbox to remember this for future sessions as well. This is how it is handled in macOS and I wish it was the same on Linux. However, on macOS if you allow it once it will allow it until you go into the settings and revoke the permissions, only allowing it for the current session by default seems more sane IMO. But that's just a small detail. |
It can certainly be done. A goal of portals has been not to pester users with yes/no questions like that though but rather be tied to specific actions that make logical sense to cancel like choosing a file. |
With all the recent press about smartphone addiction and the academic studies on how too much screen time is impacting social relations and mental health, there is a growing need for people to be able to track their daily activity on mobile devices and limit their screen time. Android and iOS now have this ability and mobile Linux in upcoming devices like the Purism Librem 5, Pine64 PinePhone and Necuno Mobile will need it. Please consider adding this feature. I know that I will use it in my Librem 5, and I'm sure that others will as well. I don't mind having to give a app permission to access this information, the first time that I open SpyApp. We are going to need authorization of permissions like this if Linux is ever going to become a mobile OS that can compete with Android and iOS. |
It would probably be implemented in the host side on the Librem so you wouldn't need any portals, so it wouldn't help. There's also no host API available right now. Somebody should really start with that, the portal can come later. |
The issue is that they have such an incredible amount of work to do already, it will take years until this feature would become a priority for them and then after that it will take yet more time until it's implemented and shipped. A third-party app could develop this in parallel to Purisms work and work on more than one compositor.
Wlroots own protocols actually has some of the APIs I'm requesting, I will start prototyping with that tonight and see how it goes. I'm proposing this because there are more compositors out there and I don't want to have to implement a client for every one of them. EDIT: Here's the result, as sway aparenly didn't implement the protocol (but rootston did, but that's not a WM made for everyday use) I decided to instead use sways socket APIs: https://github.com/ActivityWatch/aw-watcher-sway |
Bear in mind that it's not only smartphones where that can become a problem. I need this sort of tooling on my desktop PC, so I won't be switching away from X11 until this need is met. In fact, I wrote some example code to show others how to gather that sort of information under X11. |
Now GNOME Shell has merged a DBus API for this but for yet another use-case, to be able to use the dogtail tool to automatically test GUI applications. https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/326/diffs And phosh (the Librem 5 shell) has implemented the rootston wlr-foreign-toplevel-management protocol https://source.puri.sm/Librem5/phosh/commit/532cfaf085cd440c3f849e92da8c8d65681c2a9c Would be nice if we could have a solution which worked on both. |
I'm still of the opinion that we don't want sandboxed apps to get into managing foreign toplevels. That is fundamentally a privileged operation |
Well, the demand isn't going to go away. If you don't provide some compromise, you're likely to see an analogue to how, because Google refuses to allow things like YouTube downloaders in their extension store, users are becoming desensitized to being walked through the process of enabling side-loading and installing extensions which haven't had any vetting from a trusted third party. (Or, similarly, how, because virus scanners often report software cracks such as Windows 7 activators as viruses, Windows users grow used to taking the word of random strangers that their virus scanner is giving them a false positive.) You don't want people giving up all sandboxing on programs X, Y, and Z because the sandbox refuses to meet their needs... especially if it's something read-only like "track active window title", that can be handled through a permissions workflow they're probably already familiar with via the OAuth2-based integrations on sites like Twitter and GitHub. I'd probably implement such a thing by writing an un-sandboxed daemon which wraps all the disparate APIs offered by various compositors and exposes a consistent API which can be whitelisted in the sandbox manifest... who knows whether I'd get it right from a security standpoint, but I'll do it nonetheless if remaining on X11 becomes unviable before the capabilities I rely on are officially offered. At least that way, I've made what effort I can to sandbox as heavily as is viable. (UX-wise, my approach would be the aforementioned OAuth2-like approach. Sandboxed client must request an API key, which triggers a permissions prompt from the privileged host. At any time, the user can pull up a list of permission'd applications and modify or revoke permissions. However, I'd also support a "forge" mode like any good Firefox privacy extension.) |
Lets not compare The We already have a permission system so I'm not sure what you are discussing. The question is does this information ever belong in a sandbox. If it does how do we ask the user without awful "DO YOU WANT TO ALLOW VAGUE PERMISSION: [YES|NO]" |
My concern is that, if the sandbox is ill-fitted for what users want their applications to do, it will have an outsized effect on how much sandboxing is actually done. ...and I do agree that vague permissions are something to be avoided. I'm just not sure whether it has to be something that would qualify as "vague" in this instance. Something like "Read the title of your currently active window and get notified of changes" seems like a pretty clear thing that would be both intuitively obvious if a time-tracking application asked for it and suitably ominous if anything else did. (Though it'd be most useful if that also covered stuff that a human can derive from the title but a machine would have trouble with, such as the window class.) |
OK So I'm still at my question personally. We have one application interested in using this. Anything else to add to this list?:
|
I write tooling of my own (ad-hoc at the moment) which would use it in ways comparable to ActivityWatch. That's why I said that, if it's not available by the time X11 becomes unfeasible to continue using, I'll hack together some kind of " To be honest, I see this as comparable to the existence of the screenshot portal. Both are very niche things on any desktop where the compositor providers also maintain their own screenshotting solution which could use an internal API or be a compositor plugin. |
The difference with screenshots is those are very user centric actions that just happen one time and can be canceled upon user review of the contents. This is a more technical detail that will grant unlimited access in the background to potentially sensitive data that the user cannot verify at all nor cancel upon review. For example what if they open the web browser and the window title becomes "Your Name [email protected] login" or something and the user leaked data they never intended to. Maybe we could limit that information to the app-id of each window (which is not easy to know always) never the window title. |
In that respect, I was responding to the niche-ness (ie. the short list of known potential consumers with Discord crossed off).
That would render it effectively useless for my use case. I (and, I'd assume, ActivityWatch) need to be able to tell the difference between, for example, YouTube and Google Docs (in Firefox) or a masters thesis and a fanfic-writing project (in tools like FocusWriter) or multiple different programming projects (in gVim) or multiple different console applications (via a bit of In the case of applications without a plugin system like Firefox's, the only way to accomplish that without doing something even worse, like live memory inspection, is to watch the window title, where the currently focused document's name is exposed. Also, even if that weren't the case, the whole point is to display quantified read-outs to the user, so accessing something "not easy to know always" like the app-id would require some kind of awkward dance such as "Please focus and then type the display name for each application you wish to track". What about something akin to the warning overlay Firefox displays whenever an application is monitoring the camera or microphone? For example, a tray icon... possibly paired with a notification popup that displays whenever an application starts monitoring and has a "revoke permissions" button. If it annoys me, as a more technical user, I could use KDE's support for forcing tray icons to stay in the menu of inactive icons, and I stay logged in for weeks or months at a time, so I can excuse having to dismiss a notification every time my time-tracker auto-runs on login. I understand your concerns but, at the end of the day, I worry that this is just a case of real life being messy, just like the "theory vs. reality" situations in academia which inspired that famous Einstein paraphrase, "Everything should be made as simple as possible, but not simpler." |
Honestly I think I convinced myself to be against it. Window titles are sensitive and there simply isn't a way to let a user interject themselves in that process. Showing a persistent notification or tray isn't supported everywhere and I think users would just be trained to ignore such a thing. Showing when a Camera is active works because that is always privacy invading where as this is only sometimes so and users wouldn't realize it. |
It is OK that not everything will be Flatpak'd. Plenty of system level components cannot be and I think activity monitoring applications may fall under that. |
I'd still prefer to sandbox everything I can, so I'll probably go ahead with my idea to implement some kind of "unofficial portals" daemon. It'd be a nice place to collect anything that's not official merely because of concern about OAuth2-style "persistent grant" permissions. |
I'm developing a Vietnamese input method for IBus. It uses WM_CLASS to
identify what window is active (has focus in), so you can assign it to your
favorite input mode (e.g. pre-edit, surrounding text, US input, etc.). My
users like this feature very much, but unfortunately it does not work on Wayland.
Please bring back the WM_CLASS portal from X11 or something like that to
Wayland.
|
If we add both a "width" and "height" property I believe it would be possible to add support for dogtail on wayland for more shells than just Gnome Shell (Gnome Shell currently has a DBus API for this) |
@ssokolow There are many ways to sandbox things. For example you can ship a systemd service with sandbox options. |
I got the impression systemd sandboxing wasn't as well-suited to desktop applications. Am I mistaken or are you proposing that the monitoring and GUI be separate processes sandboxed using different technologies? |
You could split your daemon into its own process that would be a sandboxed DBus service. Your application could then be a Flatpak with permission to talk to it. |
That was essentially what I was envisioning for my idea of writing some kind of "extra/unofficial portals" daemon. |
Well how is it done. We have all been talking about the fact that we want to get window information about a certain windows but cannot do it. |
AutoKey is another example of a very powerful app that requires knowing which app/window is in focus when a global shortcut is triggered. |
I looked into it in more depth after writing that blog post and what I found is the software I had in mind (rofi) is using a wlroots extension called "wlr foreign toplevel management" for both getting the list of windows and focusing windows. https://wayland.app/protocols/wlr-foreign-toplevel-management-unstable-v1 https://github.com/lbonn/rofi/blob/wayland/source/modes/wayland-window.c I tried to figure out if there are gnome/KDE equivalents but wasn't able to find anything (this would have been half a year ago now, I think). That said I'm not an expert in wayland protocols, so perhaps someone else knows of something for gnome or KDE. |
@faithanalog The original post I wrote 4 years ago links to a reddit thread about just wlr-foreign-toplevel-management. Unfortunately no one else than sway/wlroots wants to use that protocol, because it allows any Wayland application to read app names and titles which is a privacy risk. |
Hello! About leaking specific private data, it may be enough, when the app requests access to window tracking, to warn:
Another way is to have this access if the application cannot communicate with other applications and cannot access the network unless access is granted. For this, maybe it is possible to save the data in specific files known by flatpak (specific filenames, file format?) and for which only these are shareable (with the same app for synchronization or parental control, over the network)? |
App calls sensitive permissions, warnings, enough. I want convenience and freedom, but if it's really for privacy, why should I use Linux? Accessibility features can also be called up by developers on Mac |
Is your proposal that the app wanting to track windows tells the user that it has privacy sensitive features and then open the privacy view in gnome-control-center (automatically or manually with a button when the app tells the user about its features?), and finally that the user should click on the row of the app and enable "Track User Activity"? If so, that's a bit too much, as users will likely find the app and enable the setting for it. It might just annoy them, especially since they just want to use the app. Besides, what's the point of doing that? Prevent users from automatically clicking "Allow" on dialog with "Allow" and "Deny" responses? |
I got the answer by re-reading one of your comments on the accessibility portal issue. So, I think a new design is needed to avoid the user automatically clicking an Allow-type response, but without annoying them with multiple steps. Going back to the privacy aspect, as I said before, it is important to warn about the potential sharing of private data via internet access or communication with other processes (i.e. tell the user if there is this potential sharing and also when the app does not have this sharing capability). There are two solutions for this:
I think this is very important because:
|
Any security expert will tell you that you're setting up a "false sense of security" situation. Exfiltrating data can be accomplished in all sorts of non-obvious ways. For example:
...or, with everyone feeling so confident about their security, go the xkcd 538-ish route and show the reader a desirable feature that seems to be legitimate evidence that the security model is getting in their way and needs to be circumvented for non-malicious reasons. (Stuff like how I've already seen some applications like keypress visualizers for screencasts going straight to "Either use an X11 session or grant non-root users access to your keyboard's For example, I could easily see a personal time tracking tool encouraging the user to circumvent this so the desktop and mobile versions of the app can synchronize records to produce unified reports. Heck, last time I checked, your proposal would, by its very nature, prevent the core function of the time-tracking software used by some online contractor marketplace services (I know oDesk had one when it was called Upwork), where the dynamic is "If you don't let it watch what you're doing when the timeclock isn't paused and upload the results to the server as a fraud-prevention measure, you don't get paid".) We already see people discussing how to circumvent sandboxing in order to get Flatpak'd/Snap'd browsers and Flatpak'd/Snap'd password managers talking to each other while we're still waiting for a WebExtensions portal. Forcing a "this or that but not both" permissions situation on users and application developers is a bad idea. |
... It is about telling the app sharing the data from it, I don't see why it is giving a false sense of security. The reference point is the app, not external processes to it. If there are external processes watching or taking this data, it isn't relevant from the app that it is sharing the data. The app can store the data in metadata or in structured files, but it must know the other processes to use them to retrieve this data, no? Other permissions can be took into account for sure: that will be only remembering the user if the app is sandboxed or not, and any phrasing can be improved. Otherwise, some areas can be certainly improved (data transmission over D-Bus, PulseAudio connection, device access, etc.), but it is not like I said to take it as is: those are only options, which are both discussable to know what to do in details. And, sharing the data over the internet is not excluded in option 2: That's starting with a sandboxed app, then allow it to share the data over the internet (the "private data sharing portal"). Using a portal already generally means not using permissive permissions (e.g. generally, using the file chooser portal means not using filesystem=home). If this connection is mandatory, something then must be done to tell it appropriately. |
I'm referring to option 2 for two reasons: First, it's not feasible to retrofit "Require these apps to have no means of sharing" because there are so many APIs the things have to interact with and bad guys only need to find one of them... and they don't need to convince security auditors... they just need a solution that the average user won't recognize as a path for data exfiltration. Java wasn't even retrofitting to the degree this is and it still had a couple of decades of applet security whac-a-mole before Java applets were finally retired. Things like JavaScript runtimes and WebAssembly can pull it off because they design their APIs from scratch to be simple enough to be audited. Equally importantly, they take a "sandbox first, functionality second if compatible" approach... an approach that, when applied to non-web applications, produces WASI, not Flatpak and Portals. Second, requiring people to do their file access entirely through special portals to get access to the monitoring API is reminding me of what I said recently regarding the idea of an xdg-pip Wayland extension. If you make your solution too onerous and restrictive, nobody will use it. It's already hard enough to get applications to switch away from legacy permissions to portals and, as I said, GNOME's vision of Wayland is already driving application developers to circumvent the security model entirely to deliver the features users want. In this case... probably by asking people to enable whatever accessibility APIs wind up being required to provide assistive technologies for legally recognized disabilities and then requesting "I'm a screen reader" permissions to access the relevant information about the currently focused window without having to give up legacy/manifest file permissions... and, if you try to require accessibility apps to be that locked down, you might wind up with some kind of |
In option 2, I'm talking about a sharing portal, which implies that the application, to use it, has no other way to share the data. If the app has permissive permissions to share data, there is no point in having a portal to share data, because a portal is built to replace the permissive permissions (including those that can be used in deviant ways ). Alternatively, there's option 1, where it's about informing the user about the potential leak of private data from the app and asking the user again to grant permission if the app comes with more permissive permissions with an update. Having one, then the other, depending on how sandboxing evolves over time, is also an option. |
And my point is that "the application, to use it, has no other way to share the data" is an untenable position to enforce unless the entire API surface of the sandbox has been designed around it, the way something like WASI has, and attempting to enforce it will just imply to users that it can be done in a reliable manner. |
There is a new wayland protocol in the staging section called ext-foreign-toplevel-list[1] which allows clients to get all windows as well as their appid+title. This has of now only been implemented in a draft commit for the cosmic DE[2], hopefully more will follow. It has a lot of similarities with wlr-foreign-toplevel-management[3], but is more limited as it only shows all windows, and to be able to see which window is focused there is yet another protocol called foreign-toplevel-state[4]. What is more convincing about these two protocols compared to wlr-foreign-toplevel-management is that they have an intention to get them into wayland-protocols. There are still two big drawbacks however that still makes it unlikely that we will be able to use these two protocols anytime soon. First is simply that most compositors probably won't implement them. Secondly is that these protocols will only be accesible from so called "priviliged" clients. Exactly how to make a client "priviliged" will depend on the compositor and that is yet another big discussion. Regardless, I am happy to see that at least something is happening in the wayland ecosystem in regards to this functionality. [1] https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/staging/ext-foreign-toplevel-list/ext-foreign-toplevel-list-v1.xml |
I'm especially happy to see that someone's finally moving on the concept of privileged clients. That was promised over a decade ago as how the original Wayland concept would allow things like display control panels to not have to be reinvented as an in-process part of every new compositor. |
That's a pretty heavy limitation for such a fundamental feature. |
When printing with an flatpak application, the notification telling that it prints says "Document from %s" where %s is the app desktop file id (i.e. will print "Document from org.gnome.Evince" with evince). This is not very clear what this name is in the notification, particularly with a third-party app, for example com.github.my_beautiful_github_name.app_name. So this changes %s to the actual translated application name, looking in the corresponding desktop file entry.
I am a maintainer of the ActivityWatch project and we are polling the currently focused windows name and title. This is not an issue under Xorg but on Wayland this is a problem as there is no common API between compositors. I have discussed this shortly with both a wlroots and Gnome developer and they both seem to agree that exposing such data would be best solved by adding a xdg-desktop-portal API for this.
Wlroots and KWin already have APIs for this (gnome-shell too but it's disabled by default) but they are all different, so a xdg-desktop-portal API would significantly simplify things.
Suggestions on properties for windows, methods and signals that would be good to have:
Window properties:
Signals:
Methods:
Links to prior discussions with Gnome and wlroots developers:
The text was updated successfully, but these errors were encountered: