Skip to content

Investigate why numbers seen in scan trace do not match configuration #6022

@keith-turner

Description

@keith-turner

Describe the bug

While testing #6010 an attempt was made to retrieve a large batch of key values by adjusting tablet server and scanner settings. The scanners would return batches that seemed to be below the max sizes specified. Its possible some configuration was missed. Would like to further investigate this.

To Reproduce

Apply the following diff to the code in #6010. When the test runs it prints out all of the spans. Each span corresponds to a scan batch. The configured max batch size was 1M and also the scanner was configured to allow up to 10000 entries. However each span shows numbers like accumulo.scan.entries.returned=797 and accumulo.scan.bytes.returned=820113. Seems like the bytes returned should be closer to 1M.

diff --git a/test/src/main/java/org/apache/accumulo/test/tracing/ScanTraceClient.java b/test/src/main/java/org/apache/accumulo/test/tracing/ScanTraceClient.java
index 2196302fea..ac0ecdb410 100644
--- a/test/src/main/java/org/apache/accumulo/test/tracing/ScanTraceClient.java
+++ b/test/src/main/java/org/apache/accumulo/test/tracing/ScanTraceClient.java
@@ -57,6 +57,7 @@ public class ScanTraceClient {
         scanner.setRange(new Range(startRow, true, endRow, false));
       }
       setColumn(scanner);
+      scanner.setBatchSize(10_000);
     }
 
     void conigureScanner(BatchScanner scanner) {
diff --git a/test/src/main/java/org/apache/accumulo/test/tracing/ScanTracingIT.java b/test/src/main/java/org/apache/accumulo/test/tracing/ScanTracingIT.java
index 1d4e70d302..818b61111d 100644
--- a/test/src/main/java/org/apache/accumulo/test/tracing/ScanTracingIT.java
+++ b/test/src/main/java/org/apache/accumulo/test/tracing/ScanTracingIT.java
@@ -36,6 +36,9 @@ import java.util.stream.Collectors;
 import java.util.stream.IntStream;
 
 import org.apache.accumulo.core.client.Accumulo;
+import org.apache.accumulo.core.client.AccumuloException;
+import org.apache.accumulo.core.client.AccumuloSecurityException;
+import org.apache.accumulo.core.client.TableExistsException;
 import org.apache.accumulo.core.client.admin.NewTableConfiguration;
 import org.apache.accumulo.core.conf.Property;
 import org.apache.accumulo.core.data.TableId;
@@ -92,6 +95,38 @@ class ScanTracingIT extends ConfigurableMacBase {
     collector.stop();
   }
 
+  @Test
+  public void testLargeBatch() throws Exception {
+    var tableName = getUniqueNames(1)[0];
+
+    try (var client = Accumulo.newClient().from(getClientProperties()).build()) {
+      var ingestParams = new TestIngest.IngestParams(getClientProperties(), tableName);
+      ingestParams.createTable = false;
+      ingestParams.rows = 10000;
+      ingestParams.cols = 10;
+      var ntc = new NewTableConfiguration().setProperties(Map.of(Property.TABLE_SCAN_MAXMEM.getKey(), "1M"));
+      client.tableOperations().create(tableName, ntc);
+      TestIngest.ingest(client, ingestParams);
+      client.tableOperations().flush(tableName, null, null, true);
+
+      var scanOpts = new ScanTraceClient.Options(tableName);
+      var scanResults = run(scanOpts);
+
+      System.out.println("results : "+scanResults);
+
+      while(true) {
+        var spanData = collector.take();
+        if(spanData.traceId.equals(scanResults.traceId1) || spanData.traceId.equals(scanResults.traceId2)) {
+          if(spanData.name.contains("scan-batch")) {
+            System.out.println(spanData);
+          }
+        }
+      }
+
+    }
+  }
+
   @Test
   public void test() throws Exception {
     var names = getUniqueNames(7);

Expected behavior

The trace data is closer to the config or we understand why there is a difference.

Metadata

Metadata

Assignees

Labels

bugThis issue has been verified to be a bug.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions