sapling/eden/fs/integration/materialized_query_test.py

132 lines
5.2 KiB
Python
Raw Normal View History

#!/usr/bin/env python3
#
# Copyright (c) 2016, Facebook, Inc.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree. An additional grant
# of patent rights can be found in the PATENTS file in the same directory.
import errno
import os
import stat
from .lib import testcase
from facebook.eden import EdenService
@testcase.eden_repo_test
class MaterializedQueryTest:
'''Check that materialization is represented correctly.'''
def populate_repo(self):
self.repo.write_file('hello', 'hola\n')
self.repo.write_file('adir/file', 'foo!\n')
self.repo.write_file('bdir/test.sh', '#!/bin/bash\necho test\n',
mode=0o755)
self.repo.write_file('bdir/noexec.sh', '#!/bin/bash\necho test\n')
self.repo.symlink('slink', 'hello')
self.repo.commit('Initial commit.')
def setUp(self):
super().setUp()
self.client = self.get_thrift_client()
self.client.open()
def tearDown(self):
self.client.close()
super().tearDown()
def test_noEntries(self):
items = self.client.getMaterializedEntries(self.mount)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
self.assertEqual({}, items.fileInfo)
pos = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(1, pos.sequenceNumber)
self.assertNotEqual(0, pos.mountGeneration)
changed = self.client.getFilesChangedSince(self.mount, pos)
self.assertEqual(0, len(changed.paths))
self.assertEqual(pos, changed.fromPosition)
self.assertEqual(pos, changed.toPosition)
def test_invalidProcessGeneration(self):
# Get a candidate position
pos = self.client.getCurrentJournalPosition(self.mount)
# poke the generation to a value that will never manifest in practice
pos.mountGeneration = 0
with self.assertRaises(EdenService.EdenError) as context:
self.client.getFilesChangedSince(self.mount, pos)
self.assertEqual(errno.ERANGE, context.exception.errorCode,
msg='Must return ERANGE')
def test_addFile(self):
initial_pos = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(1, initial_pos.sequenceNumber)
name = os.path.join(self.mount, 'overlaid')
with open(name, 'w+') as f:
pos = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(2, pos.sequenceNumber,
msg='creating a file bumps the journal')
changed = self.client.getFilesChangedSince(self.mount, initial_pos)
self.assertEqual(['overlaid'], changed.paths)
self.assertEqual(initial_pos.sequenceNumber + 1,
changed.fromPosition.sequenceNumber,
msg='changes start AFTER initial_pos')
f.write('NAME!\n')
pos_after_overlaid = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(3, pos_after_overlaid.sequenceNumber,
msg='writing bumps the journal')
changed = self.client.getFilesChangedSince(self.mount, initial_pos)
self.assertEqual(['overlaid'], changed.paths)
self.assertEqual(initial_pos.sequenceNumber + 1,
changed.fromPosition.sequenceNumber,
msg='changes start AFTER initial_pos')
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
info = self.client.getMaterializedEntries(self.mount)
self.assertEqual(pos_after_overlaid, info.currentPosition,
msg='consistent with getCurrentJournalPosition')
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
items = info.fileInfo
self.assertEqual(2, len(items))
self.assertTrue(stat.S_ISDIR(items[''].mode))
self.assertTrue(stat.S_ISREG(items['overlaid'].mode))
self.assertEqual(6, items['overlaid'].size)
self.assertNotEqual(0, items['overlaid'].mtime.seconds)
name = os.path.join(self.mount, 'adir', 'file')
with open(name, 'a') as f:
pos = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(3, pos.sequenceNumber,
msg='journal still in same place for append')
f.write('more stuff on the end\n')
pos = self.client.getCurrentJournalPosition(self.mount)
self.assertEqual(4, pos.sequenceNumber,
msg='appending bumps the journal')
changed = self.client.getFilesChangedSince(
self.mount, pos_after_overlaid)
self.assertEqual(['adir/file'], changed.paths)
self.assertEqual(pos_after_overlaid.sequenceNumber + 1,
changed.fromPosition.sequenceNumber,
msg='changes start AFTER pos_after_overlaid')
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
info = self.client.getMaterializedEntries(self.mount)
self.assertEqual(pos, info.currentPosition,
msg='consistent with getCurrentJournalPosition')
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
items = info.fileInfo
self.assertEqual(4, len(items))
self.assertTrue(stat.S_ISDIR(items[''].mode))
self.assertTrue(stat.S_ISREG(items['overlaid'].mode))
self.assertEqual(6, items['overlaid'].size)
self.assertNotEqual(0, items['overlaid'].mtime.seconds)