The benefits of Rewind
- Excellent text recognition and indexing capabilities
- Near-native design and usage experience with well thought-out interaction details
- Effective control of hard disk space and processor performance usage
From the first season finale of Black Mirror, The Entire History of You, to the Hugo Award-nominated novel The Truth of Fact, the Truth of Feeling by Descendants screenwriter Jiang Fengnan, many literary works have appeared in a similar “replayable memory chip” setting: you can record the entire audio-visual senses intact and retrieve them at any time later.
Under current technological conditions, such a setup is still surreal. But doesn’t it sound much more practical if the problem is reduced to recording the entire history of operating a computer? If you’ve ever racked your brain to recall the contents of those fleeting web pages and dialog boxes, or sighed in frustration that “where did all the time go,” you may have imagined at the time: it would be nice to be able to play back the screen images of the past.
But it seems that the problem can’t be solved just by “miracle of strength”: continuous recording of screen content, is there enough space on the hard disk to store it? Can the processor and battery afford the performance overhead? More importantly, after recording so much data, how to quickly locate useful clips instead of looking for a needle in a haystack? Is privacy guaranteed?
Perhaps because of these obstacles, previously only industry applications such as proctoring and corporate internal control would continuously record and analyze screen content, and as in literary works, raising many privacy and ethical concerns.
Rewind is a recent attempt to popularize this ‘infinite memory’.
As the name “Rewind” suggests, Rewind wants to make the history of computer operations into a retrievable, searchable database and avoid consuming too many system resources to do so. According to founder Dan Siroker, who was deaf in his twenties and had his life changed by hearing aids in his thirties, he had a desire to create other tools that could assist in empowering humans.
After several months of internal testing, Rewind was officially announced to the public in early November last year and opened for registration and download in mid-December, along with the announcement of a $10 million investment from a16z. In the cold winter of Silicon Valley at the end of 2022, this product has become a hot topic with its brainstorming idea and relatively mature implementation effect, but it is not without controversy.
In this article, we will introduce and comment on Rewind in the context of our experience and implementation principles.
People often use the phrase “like Apple made it themselves” to praise software that is designed and interacts with macOS specifications and conventions; Rewind is worthy of that claim. Pressing the default shortcut Command-Shift-Space, or holding Command and Shift while swiping with two fingers on the trackpad, brings you to Rewind’s “fruity” main interface – similar to the classic “Time Machine” interface.
The main body of this screen shows the recorded screen shot. At the bottom is a timeline with the icons of previously used applications listed in order. (A more thoughtful design is that Rewind has nearly a thousand icons of commonly used websites built in to be displayed on demand, which is more intuitive than just showing browser icons.)
Use the trackpad or mouse wheel to scroll the timeline to access the screen at different times. If you need to jump quickly, you can hold Shift to speed up scrolling; or click on the date text above the timeline and select the date you want to go to from the calendar dialog box that pops up.
Still, Rewind’s signature feature is tucked away in the search box in the middle of the interface.
Enter any text you think has appeared on the screen and Rewind will scour every nook and cranny of your “memory” for you, displaying a matrix of thumbnails in reverse chronological order with highlighted hints – often with one or two surprising offbeat discoveries.
If there are too many results, they can be further filtered by application. For example, if you remember having searched for a particularly useful web page on a topic that got lost in a sea of tabs, you can find results faster by limiting your search to just a few browsers.
Unfortunately, that’s about the extent of Rewind’s current search filtering capabilities: it neither supports delineating precise date ranges nor logical combinations between multiple search terms. It is conceivable that as the content of the recorded screens accumulates, this will inevitably affect the efficiency of the search in the future.
Speaking of browsers – for some browsers, Rewind also supports the so-called “deep linking” feature, which allows you to jump directly from the search history page back to the original page. (Currently, it supports the old standbys Safari and Chrome, as well as the up-and-comers Brave and Arc, which are good at “messing with things.)
Even if the browser is not supported, or if the recording is of another type of application, Rewind can recognize the link after transcribing the text and click on it to access it, as long as the link is displayed in full on the screen.
You may have noticed that the prompt in the Rewind search box reads “Search for anything you see, say, and hear”. Indeed, in addition to recording and searching on-screen text content, Rewind also supports searching by voice. In fact, recording and indexing voice content is exactly what the original Rewind team did with their previous venture, Scribe; Rewind can be thought of as building on that foundation and expanding the scope of recording even further.
Note that, unlike the on-screen display, Rewind does not record sound continuously, but only automatically during online meetings, or manually switched on and off by the user via the Recording command in the menu bar. This makes sense given the potential impact of recording on the privacy of others, but the current support for auto-detect recording of Zoom meetings only is clearly too limited.
In addition, the source of Rewind recordings is very limited to “hearing” the sound coming from the system’s current input device – typically the built-in microphone. In other words, if you want to record the voices of other participants in a meeting, you can only do so if …… keeps the speakers on and out, because that’s the only way to get the microphone to “hear” it. It works, but it’s not too wild.
(If you’re willing to tinker, you can splice the microphone input and the conference software output into a virtual input device to allow Rewind to “hear” more, with tools like Loopback; but that’s beyond the scope of this article.)
Finally, given the breadth of information that Rewind records, there are a few privacy-related issues worth remembering.
Rewind will then “pretend not to see” these apps and create a recorded screen without any of their windows; Rewind will also exclude the browser’s private mode window by default. However, Rewind cannot exclude the system interface, so the thumbnails of excluded apps may still “blend in” to the system’s multitasking screen, for example.
If the confidentiality of the operation at hand is high, you can always pause the recording via the menu bar icon.
If you see something in the timeline or search program that makes you “freak out”, you can press Delete to delete the currently displayed recorded screen clip.
Regulations for recording calls vary from country to country, and Rewind’s recording activities are transparent to other attendees, so it is always best to be informed when using them, whether it is prudent or appropriate.
Technical analysis and potential problems
As mentioned at the beginning, Rewind’s design idea itself is not complicated, and even a bit “simple and brutal”: as long as you record the screen first and then do the text recognition, you can search a cool later; then as long as you keep recording the screen in the background and keep doing the text recognition, you can’t have a straightforward cool.
But the reason why no one did this in the past was not because they didn’t think of it or couldn’t do it, but because it was difficult to get a good experience and keep the consumption of processor, hard disk space and battery to an acceptable level.
How does Rewind do it?
The underlying technology: a platter of other people’s work
The core idea of Rewind is to “stand on the shoulders of giants”. It’s not hard to notice that Rewind runs on relatively demanding conditions – which generally means that the software relies on relatively new features and interfaces. And so it is: Rewind makes heavy use of the hardware and software conditions provided by Apple.
Software-wise, Rewind implements screen recording primarily through the ScreenCaptureKit provided by the new version of macOS.
In addition to offering better performance and richer encoding formats than previous interfaces, ScreenCaptureKit provides screenshot tools with specific information about the display, application, and window so that developers can more carefully control what is included and what is hidden in the screenshot. This is what Rewind does to exclude application-specific privacy protection from screen recordings.
(There have been screenshot tools in the past that have cut out Photoshop-formatted images where each window corresponds to a layer; however, the API is obscure and rarely used, and Apple has announced that it will be deprecated in the future.)
As for Rewind’s text recognition, it also relies on the capabilities provided by the native Vision framework. The ability to recognize links from the interface also makes use of the same interface as the Live Text system feature.
The only exception is speech-to-text. Here, for some reason, Rewind does not use the native framework of macOS (SFSpeechRecognizer), but rather a C++ port of OpenAI Whisper. Looking at the internal documentation, we can see that Rewind has a built-in “base” English model (ggml-base.en), which trades a slightly higher error rate for faster processing.
Hardware-wise, as evidenced by the fact that Rewind only supports newer Macs with Apple silicon, it relies on a number of new features in the M-series processors, specifically the small and large core design and the neural network engine. The former saves performance and power by assigning some background tasks to smaller cores with higher power efficiency ratios, while the latter provides better image analysis and codec performance.
It should also be noted that Rewind’s claims about compression technology are somewhat exaggerated; Rewind records at a “slideshow” frame rate of one frame every two seconds, or 0.5fps, which is sufficient for its function, but is unfairly compared to what is generally understood to be a 30fps+ video.
Rewind also says it achieves dynamic compression ratios through what it calls a “trade secret” method: the less the screen content changes, the higher the compression ratio. But that’s really just the basic ability of modern encoding formats to reduce the amount of data by predicting each other from adjacent frames. For example, an uncompressed 4K resolution 10 color depth frame has (3840 × 2160 × 10)/ 8 = 10.37MB, which might be compressed to a few KB with the bi-directional prediction (B-frame) method – which itself is at the ratio that Rewind claims.
Another curious question is how Rewind stores so much data? In fact, although the effect looks quite sci-fi, the actual approach is relatively simple, if not a bit rudimentary.
Of course, the most critical file is db.sqlite3, a SQLite format database that holds the recording history, the title, application type and switch time of the foreground window, the text information extracted from the video and audio, and the number of frames and coordinates in which the text appears in the original recording file.
How do you evaluate this storage structure? I’m not an expert in this area, but there are still some rather obvious improvements that could be made. For example, storing an additional full-size screenshot when you already have the full video seems a bit redundant and takes up a lot of space. In addition, the current storage hierarchy is very flat, and as time passes, there will be a huge number of subdirectories under data directories such as chunks, which may affect the efficiency of traversal.
In fact, when I tested it on my M1 Max model, after a month of use, scrolling through the search results required a relatively significant wait time, with earlier results appearing in batches of about ten, with the wait time increasing the further down the list you scroll. Considering that Rewind even allows users to “keep” records forever, the performance degradation over the years may be even more pronounced.
Performance and energy consumption: running like a ninja, consuming power like a bandit
Another common concern is the impact on energy consumption. As mentioned above, Rewind only runs on Apple silicon models and is designed to take advantage of the large and small core design and neural network units of the M-series processors. Official data says that for the M1 series processors, 20-40% of the processing time of a single core is typically used.
Real-world testing shows that this number is accurate and that Rewind actively calls on smaller cores to complete the task. However, there are some spikes when processing video for indexing, which can briefly take up the full processing time of up to two small cores.
Overall, Rewind’s performance is well optimized and running in the background is largely unobtrusive and does not impact daily operations.
However, the impact on battery life is not so optimistic. A common metric for this is Energy Impact under the Energy tab of the Activity Monitor.
(Energy Impact is calculated based on a combination of processor usage, how often the processor wakes up, and factors such as disk writes, GPU usage, and network activity, which are weighted differently on different Mac models, but can be used as a reference for power consumption by comparing it to other programs on the same device.)
Rewind does top the list for this metric, with a 12-hour average on my computer often around 700. While the value of energy impact does not linearly correspond to power consumption, readings at this level undoubtedly imply a perceptible range impact; some users have reported that Rewind reduces range by twenty to forty percent.
(For reference, Safari, which is known for its power savings, typically has readings in the single digits or smaller double digits; Chrome, which is more power-hungry, is often in the large tens, sometimes hundreds to hundreds, range. (In addition, if the energy impact consistently reaches 20 or 30, it will “make” the high energy list in the battery icon drop-down menu.)
So when using your MacBook offline, it’s a good idea to consider turning off Rewind for a while, but officials say they’ll be adding a feature that automatically stops transcoding and indexing when the battery is low, so that’s something to look forward to.
Privacy: Technical hurdles are easy to break, but psychological hurdles are hard to escape
From the beginning of the internal testing, many people said that after hearing how Rewind works, their first reaction was “uneasy”. This is understandable: having a third-party software record your screen and identify text from morning to night is a bit like having a camera on all day at home, and requires a strong trust in the brand to overcome potential concerns.
Here’s another hitch: early versions of Rewind didn’t actually do pure offline operation, but used a cloud service to handle speech-to-text (see the help page archive). This caused some user backlash at the time, and contradicted the “data stays local” pitch on the home page. Luckily, the Rewind team listened to outside opinions and made a change two days after the launch, implementing offline transcription with the built-in OpenAI Whisper model.
But even with pure offline operation, Rewind has not built up enough credentials and credibility as a startup to assuage users’ concerns; their approach to promotion and communication has been somewhat counterproductive.
In addition, as discussed above, Rewind currently generates video files and text databases that are stored in clear text and can be accessed directly if the path is known. Considering that it is difficult to avoid mixing plaintext passwords or other sensitive information in the continuous recording, it would be a reasonable direction to further improve security by using encrypted storage, for example.
However, when answering questions on the website and on Twitter, officials have adopted a rather misleading caliber, stating that recorded screen data is protected by encryption as long as the FileVault feature that comes with macOS is enabled. But as we know, FileVault is a full-disk encryption feature that does not provide additional protection to specific software when the system has already been attacked. It would be absurd to say that if Rewind is encrypted with FileVault enabled, then a random Word document on the desktop would be “encrypted” as well.
The privacy concerns that Rewind raises are a reminder that as the range of software capabilities continues to expand, privacy and ethics will become constraints that cannot be ignored, in addition to technology and performance (see a bunch of recent cases in the AI industry); developers need to consider the potential consequences of these aspects at the outset of design.
Payment model and sustainability
Similar to most Silicon Valley-originated, venture-backed products, Rewind’s pricing level is quite first-world in its thinking: $20 per month.
From the Rewind team’s point of view, they are doing something that saves lives, and users can easily feel that they are getting more than their money’s worth. To back up this argument, there is even an official collection of gratitude tweets. But clearly, even users in the European and American markets are not likely to easily add such a large piece to their already bloated subscription lists.
In my opinion, if Rewind pricing is not considered reasonable enough, the problem is not the subscription system, nor the price itself, but the adoption of a fixed-price subscription system.
In terms of development costs, Rewind’s underlying technology comes from system functions or third parties, and takes advantage of the storage space and computing performance of the user’s computer, so its product is arguably primarily a secondary package of off-the-shelf resources. This is not to downplay Rewind’s innovation, but it does mean that a stronger argument needs to be made about where the payoff is.
In terms of value to the user, Rewind is somewhat similar to a backup tool and data recovery tool, serving as a kind of insurance policy, a “keeper for a thousand days”. For example, in the month I tried it, I only used its power twice: once to retrieve a reply in a web input box that I mistakenly closed, and once to retrieve a terminal command that I forgot to execute after poking around for a long time. It was a pleasure, but not a $20 pleasure? It doesn’t seem to be.
So it’s not that a product of this nature can’t be charged on a recurring basis, but using a fixed rate would be a poor match and not very convincing to users.
If I were to put myself in the shoes of a “crooked businessman”, one model that might be worth considering would be to charge for the basic background screen recording for free, while charging for the frequency of use of the core functions, i.e. search and timeline. The specific measure could be the number of times the search and screenshot operations are performed, or the length of time the timeline interface is used. In this way, more users can be attracted to use the application, and the fees can reflect more fairly the actual help brought to users.
In addition to the payment model, Rewind’s own sustainability was also an issue that needed to be considered. Building your business on technology provided by others is always a risk; if the macOS interface that Rewind relies on were to change significantly in the future, it would be a major challenge or even a matter of life and death for the software. This technical architecture also dictates that the cost of cross-platform is very high, comparable to a redevelopment effort. As the official Q&A says, it will focus on macOS for the “foreseeable future” and may only “consider” whether to support other systems and mobile this year; I believe the latter half of the sentence is just a casual remark for now.
In addition, anchoring to the Apple ecosystem is not necessarily safe: we can’t forget that Apple has always had a “penchant” for secretly observing popular products and making them built-in when the time is right. From replacing Growl with Notification Center to replacing AstroPad with “Follow the Flight”, this approach, known as sherlock, has always been a sword in the air for Mac developers. And given the depth of Rewind’s ties to Apple’s hardware and software, and the potential impact of its features on information security, there is no one better suited to develop this feature (even a simplified version) than Apple itself, and more likely to convince users who are hesitant to try it because of privacy concerns.
If there is one word to describe the feeling of Rewind, it is probably “Pandora’s box”. On the one hand, it proposes an attractive scenario – “Rewind”, but on the other hand, it leaves a lot of unfinished answers in terms of performance, privacy and business model. The result is a peculiar application that people can’t help but want to try, and can’t help but be vaguely worried when they really use it.
But Rewind’s value lies not only in its functionality, but also in its thinking. Few apps do what Rewind does, “mobilize all the resources you can, unite all the forces you can”.
Rewind takes full advantage of the hardware and software offered by the Apple platform to keep the cost of continuous screen recording within acceptable limits, thus making an effect that previously existed only in the imagination a reality.
Rewind realizes that it’s not just files and data that are worthy of being indexed and searched; the very process of exploring interactions with a computer can bring a lot of value to the user. Previously, only developers and advertisers collected usage data for analytical tracking purposes; Rewind indexes and presents this data in a meaningful way that better serves its true owners – the users themselves.
Rewind demonstrates how a combination of “local sub-optimal” can produce “overall optimal” results. If you are just extracting data from a single application, it is clear that adapting standard interfaces and protocols is the way to go. But if the search scale is extended to the entire usage history, it is simply not realistic to adapt to each one of them, and the very rustic and crude screenshot word extraction becomes the only “common denominator” that can be used for all applications, the “universal exchange format”.
All in all, Rewind may still have many problems, and it may not be the ultimate form of “regret medicine”, but it is indeed an interesting work with ingenuity and ingenuity everywhere. No matter what your choice is, I believe you can be inspired and thoughtful in trying it out.